Why spec-driven development?

If you’ve ever tried to create something with an LLM, you’ll know how much back and forth can be needed to get things right and close to what you had in mind. You might have felt like you’d been clear, but it’s just not quite getting it.

You’ve given it context, you’ve worked with sub-agents, but the conversation still doesn’t feel that productive. There can be a lot of misinterpretation and frustration!

As with loads of aspects of working with AI, there’s an approach that sounds fancier than it is: spec driven development - think Jira tickets for LLMs.

Why bother with specs?

Let’s step back for a moment. Fundamentally, LLMs are probabilistic - based on a degree of confidence, they try to satisfy your query. That’s not always what we want - with that confidence level hidden, we’re not sure even the LLM thinks its response is any good!

Back in our context, we can do a number of things with our ways of working, one of which could be adding something like “if your confidence level is less than 0.9, prompt the user for more information or explain why you aren’t that confident”. The more specific we are about what we want, the better the results will be.

What do you want?

If you’re doing more than just vibe coding, it’s worth spending time discussing various levels of what you want as documentation the LLM can reference at the right time. This could be project-wide all the way down to specific features or sub-tasks. From how you want the working relationship to be (which you could have a personal context) to project level context down to these details - though you do have to keep an eye on its behaviour as despite best efforts, they can sometimes ‘forget’!

In many projects, I use a very simplified kanban structure but you can find a structure that works for you. I create a /tasks directory in the root of my project with todo, doing and done directories within. I’ll create a template for a task file with some front matter (date created, title, etc) and some headings we might use.

Before working on anything, I’ll spend some time discussing the task with the AI to be clear on the goal, constraints, and expected outcomes along with references to any related files that might be useful. Depending on the nature of the task, I might ask sub-agents to review it too for a more rounded view.

What this tends to mean is that the LLM has some very clear expectations set, access to things like coding guidelines and so on. It works well with constraints like this. This can feel like a lot of extra effort over just prompting but in putting in these expectations and guardrails early, there’s less back and forth and the quality tends to be a lot higher from the start.

After the task

Once a task has been completed, reviewed and tested, I ask it to update the task file with what was actually done before marking it as complete on the task file and moving it to the done directory. Not only is this good for clarity, but it also gives us a wealth of historical context as the project grows, and this can be mined for why and when certain changes were made, with very little overhead.

I’m sure like many of us, writing tasks or tickets has never filled me with joy. The difference here feels like it’s a conversation that ends with clarity for both me and the LLM, which in turn gets much higher quality output. The difference with using this method, and the other techniques I’ve talked about, gets you so far away from randomly prompting and the improvement in the output is incredible. I can’t imagine just going to a prompt chat box cold any more.