Nurturing AI Prototypes into to Production the right way

AI-generated copilots and low-code platforms promise to revolutionise how we build applications. Low-code platforms that can scaffold entire applications in minutes seem like the answer to every developer's productivity dreams. But what happens when you try to take that AI-generated prototype and turn it into something you'd actually trust in production?

A few months ago, I stumbled across an AI-generated app that looked promising but was clearly just a prototype. Instead of starting from scratch, I thought it'd be interesting to see what it would actually take to turn it into something you could deploy in production.

Spoiler alert: it was way more work than I expected.

The Problems Started Immediately

So. Much. Unused. Code.

The first thing that hit me was the sheer amount of code that did absolutely nothing. I'm talking about 70% of the codebase just sitting there—UI components that never get rendered, API endpoints that go nowhere, functions that return the same thing no matter what you pass in.

It's not that this code was breaking anything or slowing the app down. It was worse than that. Every time I wanted to change something, I had to figure out whether these random bits of code were actually important or just dead weight. Deleting the wrong thing could break something in a way that wouldn't be obvious until much later.

No Standards Whatsoever

The code had no standards. None. Everything was typed as any, there was no consistent formatting, and it looked like it was written by five different people who'd never spoken to each other.

When I tried to add ESLint, I had to disable pretty much every rule just to get it running. Then I spent days slowly turning rules back on, one at a time, fixing hundreds of violations. It was like trying to renovate a house built without foundations—every fix just revealed more problems underneath.

When AI Gets Creative

Here's where things got weird. The AI had generated code that referenced things that didn't exist. Like, at all.

There were API calls to endpoints that were never defined. Database queries looking for tables that didn't exist. Complex if-else chains that somehow all returned the exact same result. It was like the AI had imagined an entire parallel universe where these things made sense—what we call "AI hallucinations" in the development world.

The worst part? Some of this stuff was buried deep in the code, so the app would run fine until you hit one of these phantom features. Then it would just... break.

Documentation? What Documentation?

The comments in this code were either completely useless or non-existent. I'd see gems like "// This function handles user authentication" above a 200-line function that did... who knows what. Meanwhile, the actually complex stuff had zero explanation.

Oh, and did I mention the 6,000-line route file? Yeah, that was fun to debug. One file handling every single API route, another massive file for all database operations. It was like someone had taken the concept of separation of concerns and thrown it out the window.

Zero Tests

Not a single test. Not one.

For a codebase this big and this messy, having no tests was terrifying. Every change I made was basically a gamble. Did I break something? Who knows! The only way to find out was to click through the entire app manually and hope for the best.

I spent more time manually testing basic functionality than I did actually writing code. It was exhausting.

How I Fixed It (Eventually)

After a few weeks of this madness, I developed a process that actually worked:

First, I cleaned house. Before touching any features, I went through and deleted everything that wasn't being used. This took forever, but it was worth it. The codebase went from intimidating to manageable.

Then I added linting rules gradually. Instead of trying to fix everything at once, I'd enable one ESLint rule, fix all the errors, commit, then move to the next rule. Slow but steady progress.

Testing became non-negotiable. I started writing tests for everything I touched. New feature? Test first. Bug fix? Test to reproduce, then fix, then test again. It slowed me down initially but saved hours later.

I broke up the monster files. That 6,000-line route file? It became about 20 smaller, focused files. Much easier to work with.

Documentation for future me. I started writing comments that explained why something worked the way it did, not what it was doing. Much more useful.

Would I Do It Again?

Honestly? Probably not.

Don't get me wrong—I learned loads from this experience, and the final product works well. But the time I spent cleaning up AI-generated code could have been used to build something from scratch that would have been cleaner and more maintainable.

AI tools are brilliant for getting something up and running quickly, especially for demos or proof-of-concepts. But if you're planning to maintain and scale the codebase long-term, you're probably better off starting with proper foundations rather than trying to retrofit them later.

The real value isn't in creating production applications using Replit (or other brands), it's in the rapid prototyping, exploring ideas, and getting non-developers excited about what's possible - and the funding that comes with that.

More on that, the real risks of low-code platforms, and how we're embracing AI to add immense value to our clients soon.

Just don't expect to deploy it without some serious human intervention first. Though who knows? Maybe in six months I'll be writing a completely different post.