
Generating code is a solved problem. Codex, Claude Code, Mistral’s Vibe and Gemini CLI are all capable of creating good software.
But writing code creates a form of entropy. The more we add to a system, the worse it gets. Claude, like a junior developer and their manager, couldn’t care less about refactoring and code quality. All three have no sense of the risk software entropy poses. Seasoned software engineers, on the other hand, are obsessed with refactoring to fight that entropy.
The first thing every developer does when they finally get their hands on a $200/month all-you-can-eat Claude Max account is give it something huge to build. They use ChatGPT to generate a 50-page requirements document and then ask Claude to one-shot the full-blown SaaS product. Make it production-ready, make no mistakes.
I remember watching Claude generate many thousands of lines of code to build a Calendar-syncing app. It reached its 5-hour limit three times before finally calling it a day. Running that app for the first time, the anticipation was real: this felt like a new era for software development!
But what I got was a broken legacy system with no maintenance and a ridiculous amount of bloat. It had built its own autoscaling, but it could not sync a calendar.
Claude, like most people, thinks of software development as adding stuff. Testing, refactoring, simplifying and fighting entropy are not on its beautiful silicon mind at all.
The average Lovable user has a very similar arc. They start with the euphoria of finally ditching those pesky devs. The machine can do it for them! But then, gradually, the machine starts doing the wrong things. It moves a button without being asked. Something that used to work is now different or broken. After a few hundred prompts, the system becomes so brittle that it becomes unreliable. Adding a simple feature means retesting the entire app.
Obviously, they blame Lovable. The Lovable Reddit is full of people claiming the tool got worse. This used to just work! They should blame software entropy instead.
I’m convinced we can write a refactoring agent. That’s almost trivial. Generating test automation is also solved. Coding agents generously add their own tests and can be instructed to write even more.
The problem is that in both cases, the system lacks a critical eye. The agent can generate code, but it doesn’t understand the product. You can have 100% code coverage and still have breaking changes. All the tests can pass while Claude hides the Save button in the About screen.
So, we need humans in the loop to keep this critical eye.
The big question is: for how long? Can we create a feedback loop that detects if something is broken? How much of the critical eye can we automate?
That’s the challenge of the next few months.
Solving entropy is the bottleneck for fully autonomous software development. As long as we don’t crack that nut, we will need to review and test every single output.
So, I’m running an experiment: How autonomous can an agent build a video game? How much human intervention is needed, and how fast does it derail? Can we get to the point where only high-level human guidance and creativity are needed? And what would such an alien design look like?
Those who want to follow along can take a look at Daily Doom. Every night, the agent reads the GitHub backlog and adds those features to the game. Every morning, my OpenClaw bot informs me of the progress that has been made and invites me to play-test them. If I have feedback, it adds that to the backlog. If not, it makes up more gameplay additions. It also writes a short daily blog post about what it did overnight.
The feedback mechanism is simple: there are unit tests and a Playwright script that takes screenshots of the gameplay. Those get interpreted by an LLM to test certain things. Is the healthbar still visible? These tests, like the gameplay, are completely autogenerated.
After a few difficult nights, the system is now up and running. The game is evolving on its own in a very non-human way. My task is just to report issues and to buy it some extra Anthropic tokens from time to time.
Three days in, the code is pretty messy already. There is an exposed (disabled) API key in the GitHub repo, along with a few really wonky game assets. That’s a wild start.
I’m curious to see how long we can get this going before entropy hits in such a way that a human is needed to clean it up. With a bit of luck, it’ll work for months.
But then, who knows, maybe software entropy is a solved issue?
