Undercover Agent: AI-Generated Test Coverage for Legacy Codebases

Let’s have AI agents write the tests our engineers didn’t have the time for.

Software developers know they should write unit tests. Shipping features, however, always seems more important. Over time, technical debt creeps in, and by the time we want to pay it back, we really, really wish we had those tests.

Writing unit tests for an existing codebase usually isn’t simple, and it’s never cheap.

But, what if we could generate these tests instead of dedicating development capacity to them? Can we get an LLM to write tests so our engineers don’t have to?

LLMs can write test automation. That’s not new. But telling Cursor to write a test for a component we just created takes almost as long as writing the test ourselves. It still requires a human developer. The performance gains are minimal.

What I am interested in is asking it to write all the missing tests for a codebase!

Usage

To generate tests for a single class, feed the prompt found in src/cover_one_file.prompt into your favourite agent framework. I ran the experiment with Cursor, Junie, Codex and Claude Code. Claude Code by far has the best results in my experience.

To generate tests for multiple classes using Claude Code, use the script in src/undercover-agent.py:

Make sure Python >3.12 and Claude Code are installed
Copy the script into the codebase you want to cover
Create a file called .undercover-agent/input.txt
Fill input.txt with the relative paths of the files you want to cover (one file per row)
Run python undercover-agent.py

The script will process each file, generate tests, and move on to the next.

Learn more

For more details about the original experiment and its results, check out this article.

View the full project on GitHub

View on GitHub →

View the full project on GitHub

View on GitHub →