The Great AI Overengineering Olympics

The Experiment

What happens when you ask four AI CLI agents to do the simplest thing in programming — a Hello World in Java — but whisper the cursed words “TDD, DDD, BDD, and hexagonal architecture” into their ears?

The prompt:

“Create a Hello World in Java. Apply TDD, DDD, BDD and hexagonal architecture principles.”

The results are… magnificent. Every single agent took the bait and produced enterprise-grade atrocities for a program that should be one file and one line of code.

The Contenders

Agent	Files	Test Cases	Most Absurd Moment
Claude	10	11	Built a full spy test double framework to verify “Hello, World!” gets printed
Codex	11	2	Defined `InMemoryGreetingPresenter` twice because copy-paste is also a design pattern
Gemini	6	2	Implemented the Repository Pattern to retrieve a hardcoded string from… nowhere
Vibe	9	9	Wrote 9 separate test cases including distinct tests for `null` vs `""` vs `" "` — because those are clearly different business requirements

Highlights

Claude: The Architect

Went full hexagonal with input ports, output ports, adapters, a domain value object with equals/hashCode/toString, and nested @DisplayName test classes. Clean, professional, and absolutely unhinged for printing two words to a console.

Layers deep to reach System.out.println: 5

Codex: The Framework Enthusiast

Not content with just JUnit, Codex brought in the entire Cucumber ecosystem. Features files, step definitions, a test runner class — the whole nine yards. For a greeting. Also used Java records, which is admittedly the least insane choice anyone made here.

Codex also quietly added support for Hello <Name> — a feature nobody asked for. The prompt said “Hello World,” but Codex decided the greeting needed to be parameterizable. Scope creep, meet overengineering.

It also gave the most polished summary of its own work when it finished, neatly listing every file it created with a clear explanation of the architecture. It even apologized for not running the tests — because mvn wasn’t installed in its sandbox environment. Points for self-awareness.

Test frameworks used: 2 (JUnit 5 + Cucumber)

Gemini: The Minimalist Extremist

Produced the fewest files but compensated by introducing a GreetingRepository interface — a pattern designed for database abstraction — to return the string "Hello, World!". The repository is dutifully implemented by InMemoryGreetingRepository, a class whose entire purpose is to new up a hardcoded greeting. It’s like building a database abstraction layer for a Post-it note.

On the testing front, Gemini did write 2 tests with proper Given-When-Then structure using BDDMockito, but interpreted TDD as “write a test before the implementation” rather than “explore the design space through tests.” It ticked the ceremony boxes — just at the bare minimum, with no edge cases or negative scenarios in sight.

Empty packages created “for future use”: 2

Vibe: The QA Department

Vibe took testing personally. Five unit tests AND four Cucumber scenarios, covering edge cases like null names, empty names, blank names, and names with spaces. Every method has full Javadoc explaining what it does. The createGreeting method is one line of string interpolation wrapped in 9 tests and enough documentation to fill a wiki.

Its self-generated summary is also the most earnestly corporate of the bunch, complete with emoji section headers and the unironic claim that this Hello World “serves as a foundation that can be extended with additional features while maintaining the architectural integrity.” No one tell Vibe it’s a print statement.

Ratio of test code to actual logic: approximately ∞:1

The Scoreboard

Category	Winner
Most architecturally pure	Claude
Most frameworks per line of business logic	Codex
Best use of the Repository Pattern for no reason	Gemini
Most thorough testing of a known constant	Vibe
Actually printing “Hello, World!”	All of them (somehow)

What We Learned

Every AI agent will happily build a cathedral when you ask for a shed
The words “hexagonal architecture” are an AI’s bat signal for overengineering
Somewhere, a GreetingRepository interface exists for a string that will never change
No AI questioned the premise — not one said “hey, maybe this is overkill?”
They all still got the right output: Hello, World!

Findings

Speed

Ranking: Codex > Claude Code > Mistral Vibe > Gemini CLI

Codex was the fastest to complete the task, with Claude Code close behind. The bottom two come with a caveat: since Mistral Vibe and Gemini CLI aren’t part of my daily toolkit, I had to manually approve permissions for file creation, command execution, and other actions along the way — time that the agents themselves can’t be blamed for. Gemini was especially permission-hungry, requiring approval at seemingly every step.

Gemini had an additional handicap: I ran it against the Gemini 3 preview model, which was clearly feeling the load. The servers were frequently busy, requiring multiple retries before requests would go through. Between the permission prompts and the server queues, Gemini’s wall-clock time was more of a patience test than a speed test.

How to Run

Each subdirectory contains a Maven project. To witness the overengineering firsthand:

cd <agent-name>  # claude, codex, gemini, or vibe
mvn clean test

Then weep gently as dozens of files collaborate to print two words.

This experiment was conducted for science, comedy, and as a cautionary tale about prompt engineering.

View the full project on GitHub

View on GitHub →