As AI coding tools have become ever more powerful, taking on sizeable programming tasks with autonomy, the industry is pondering how we adapt our processes to capitalise on the strengths of this technology and address its weaknesses.
Spec-Driven Development (SDD) is a new approach designed around agentic AI where polished specifications become the single source of truth, guiding AI agents to reliably generate working software.
Intrigued by the idea, I decided to put SDD to the test using Spec Kit, rebuilding a feature from one of my hobby apps to see how this workflow holds up in practice. The experience wasnāt great, a sea of markdown documents, long agent run-times and unexpected friction. While SDD is an interesting thought experiment, Iām not sure it is a practical approach.
To find out more about my experiences, read on ā¦
When you first start using AI agents first challenges you wrestle with is just how much detail to give it in each instruction. Is it better to guide it towards your goal through multiple, single sentence instructions? Or provide it with a more detailed paragraph of text, hoping that it will successfully write a bigger increment in a single go?
A year or two ago, when these tools were relatively immature, it was generally more effective to work in small increments. Larger increments rarely worked effectively, leading to more time spent debugging and ultimately a longer journey to reach your goal.
Just under a year ago I recall reading Harper Reedās excellent post titled āMy LLM codegen workflow atmā. where he outlined the process that worked well for him. Rather than provide a longer prompt (i.e. a few sentences / paragraphs), hone the idea in conversation with an AI chat, pass it to a reasoning model, and ask it to build detailed plan. Once youāve reviewed the plan, ask the coding agent to implement. I tried the approach myself and found similar results ā I was able to implement larger increments armed with this plan. Harper noted that this process works best for greenfield projects.
As an aside, itās amazing how much things have changed in less than a year. Iāve re-read Harperās blog and it already feels very dated, through no fault of the author, it was ground-breaking stuff, for a month or two!
I wanted to give SDD a try, but there are growing number of frameworks and approaches to choose from. Here are a few notable projects and tools relating to this nascent approach:
While these tools all provide a form of SDD, their workflow to differ quite a bit. Iād definitely recommend reading Birgittaās post which outlines three different levels, spec-first, spec-anchored and spec-as-source, which helps to understand the differences.
Of the above frameworks Spec Kit adheres to SDD in the purest form (spec-as-source). Here is a paraphrased version of their introduction to the topic:
Traditional development treats code as the source of truth, causing specifications to drift, creating a gap.Spec-Driven Development reverses this by making the specification the primary artifact that from which you can generate implementation plans and code. Specifications become precise enough for AI systems to produce consistent, working software directly from them.Maintaining or extending a system means updating the specification and regenerating code rather thanmodifyingcode manually.Debugging and refactoring involve clarifying the specification, allowing teams to focus on design clarity and problem-solving.
It is also wildly popular, having attracted 50k GitHub stars in just a couple of months.
I decided to take Spec Kit for a spin.
Over the past few months, I have been vibe engineering a personal application called KartLog. My son races go-karts, which involves capturing and managing a lot of data (engines, tyres, lap times, kart setup, temp, weather conditions ā¦). I used to store this all in a Google Sheet with various scripts and formulae, but decided it would be fun (and useful) to create a dedicated Progressive Web App. It is a very conventional tech-stack, JavaScript, Firestore, SvelteKit and the SMUI (material) library.
Rather than implement a new feature I decided to āsurgically removeā an existing feature. The app currently allows you to manage the tracks youāve visited:

I removed this feature, reverting it to a simple string when recording sessions. This resulted in ~1000 lines of code being deleted.
My goal was to re-create this functionality using Spec Kit, evaluating whether this approach is superior to the current vibe engineering techniques I use (covered later). This felt like a good test case; a reasonably sizeable product increment of modest complexity (CRUD functionality, integration with existing data model, GPS integration), but also one where I could confidently write an unambiguous specification ā because Iād already built it.
Spec Kit is quick and easy to install, being provided as a simple command line tool. The specify init command creates the project-specific configuration. Iām using GitHub Copilot (with Claude Sonnet 4.5), which results in Spec Kit adding a number of custom agents, which can be run via slash commands in the Copilot chat window, e.g. /speckit.specify. Looking in the .github/agents folder reveals that these custom agents are basically prompts, with a little bit of metadata.
So far, so easy.
During this blog post Iāll link to the git commits for each step, hereās the first one, adding the Spec Kit prompts.
The first step is to create your āconstitutionā, which the documentation describes as āa set of immutable principles that govern how specifications become codeā.
I ran the constitution agent, waiting for 4mins, watching it analyse the codebase, as per the prompt, and construct a 189-line markdown file. I reviewed the file, and it looked OK (if a little obvious) in its contents. The default constitution favours a TDD approach, however, for my hobbyist app I am happy with manual verification. I asked Copilot to reflect this in the constitution, and after a couple more minute I had something I was comfortable with.
You can see the file it produced in the corresponding commit.
Going forwards, for each step, Iām going to include the overall statistics relating to agent execution time, the time I spent on reviewing artefacts and tokens consumed:
The next step is where it gets interesting, building the specification for the feature you want to implement. I decided to split my implementation into a couple of steps ā here is the prompt I used to create the specification for the first:
/speckit.specify Iād like to extend this application to provide more feature rich functionality for managing circuits. At the moment the circuit is just a string associated with each session. Iād like to turn this into an entity (with circuit name, latitude, longitude, notes), that has a CRUD style interface (similar to engines) that allows the user to update and manage circuits
The above created a 189-line specification, which you can see in this commit. The specification comprises 5 user stories, which are a decent break down of this āepicā as well as 18 functional specs, 6 outcomes, 8 assumptions and more. This looks reasonable, and a few of the assumptions it identified (e.g. the need to migrate session data to conform to this new data model) were quite insightful.
Oh yes, it also added a 41-line self-assessment, checking this specification against the constitution. Good to know!
I did make a few changes to the specification (via the chat), removing some NFRs that felt a little misplaced.
The next step is to create a plan, āOnce a feature specification exists, this command creates a comprehensive implementation planā.
Executing this step took 8 minutes, and created a lot of markdown:
It isnāt entirely clear whether you are supposed to review the output of the Plan step, but the contents detailed above feels quite excessive. Take a look for yourself.
At this point I thought it was time to implement, but not yet, there is one more step ā task list creation. āAfter a plan is created, this command analyzes the plan and related design documents to generate an executable task listā
Thankfully this was relatively brief, creating a 66-step implementation plan, mapped to user stories, includes checkpoints, opportunities for parallel execution. Hereās the commit.
The /speckit.implement command took 13m15 to execute, creating 14 files and \~700 lines of code (see this commit), which seems about right. Ahead of reviewing the code, I launched the dev server and ⦠it didnāt work!
There was a small, and very obvious, bug in the code that was easily fixed by pasting the error into the chat agent (vibe coding style!). The error itself was very simple, a variable (circuitsData) used within the ānew sessionā form wasnāt being populated from the datastore.
Given that I am following SDD, I should clarify my specification, then repeat the above steps. But how to express this bug from a specification perspective? It was a trivial and clumsy mistake. I asked Copilot, and it concurred, this isnāt an issue with the spec.
There is a lot of debate on the GitHub Discussions board about how to refine / fix the implementation. It isnāt entirely clear.
After this, I now had a fully functional application where I can add / remove circuits and associate them with sessions:

And the grand totals were:
The next step was to add GPS functionality, deriving a spec from the following:
use geolocation APIs to make it easier to manage sessions. When editing sessions, the circuit should default to the closest circuit to the users current location, and when adding circuits, allow the lat / long to be set using the current location.
Iāll not got into the detail of each step again, just provide the totals:
You can see the individual steps in my commit history if youāre interested. Given that this was a small increment, I was surprised to find the implementation step took 11 minutes to execute.
Taking a step-back, the timeline for implementing each feature (via a specification) was roughly as follows:

Ultimately a lot of time spent reviewing markdown or waiting for the agent to churn out more markdown. I didnāt see any qualitative benefit to justify the overhead.
So, how does this compare to the approach I typically use, building features using small increments and simple prompts?
Hereās the summary:
I reached my end goal much faster, and without any bugs in the implementation.
Viewing the process as a timeline you can see some significant differences between the spec-driven and āregularā.

My high-level approach is as follows:
Compared to working with Spec Kit there is very little down-time at all. I am working while the agent works alongside me.
Also, my review is all āhigh valueā, functional testing, code review. I feel like I can keep up with what the agent is producing, even though it is producing it faster than I could.
I like this process, it feels empowering.
The simple fact is I am a lot more productive without SDD, around ten times faster. But speed isnāt everything, putting this point aside for now, I have some more significant concerns about SDD (or at least the way Spec Kit implements this approach).
Spec Kit starts from the premise that āSpecification is the lawā, as opposed to ācode is lawā. They advocate creating detailed and polished specifications from which any sufficiently competent AI can generate a fully functional application. I think this is a flawed goal.
Code is law because it is formal language you can reason about. You can test it. You can prove it is right or wrong. Specifications, or at least ones expressed in the markdown format of Spec Kit, lack this formality. They are not a law I would put my trust in.
Much of the content within the various Spec Kit steps is duplicative, and faux context. For example, it creates āfakeā background context (reasoning about the UI design because I might be wearing gloves at the circuit). This is a common problem with the use of AI in general ā it creates detail that fundamentally lacks depth of value.
**Rationale**: Karting users need to log data trackside on mobile devices, often with gloves or in suboptimal lighting. Poor mobile UX renders the app unusable in its primary context of use.
Finally, in most development processes specifications are a point-in-time tool for steering implementation. Once complete, and fully tested, how often do you re-visit a user story? Very rarely, they lack much value once the feature is complete.
Iām not against all documentation, on the contrary, capturing architectural details and design decisions is very valuable. Code can give you the āwhatā, but cannot tell you the āwhyā, i.e. why was a specific pattern selected?
As an industry weāve moved away from waterfall, which favours up-front design and a relatively rigid sequential process, replacing it with something more iterative and agile. My experience of AI is that it makes you even more agile, iterations are faster. Spec Kit drags you right back into the past!
One of the biggest super-powers of AI (for software development) is that code is now ācheapā, it is quick and easy to produce. Yes, the quality is variable, but you cannot deny the speed.
As we adapt our software development processes because of this new reality, we should seek to maximise the benefit of this fact. Code is now cheap; we can create it quickly and throw it away just as fast. Spec Kit, and SDD, donāt capitalise on this.
This one is quite simple, the code quality created when following Spec Kit was just fine, but the simple and obvious bug threw me a little.
Furthermore, Spec Kit burnt through a lot of agent time creating markdown files that ultimately didnāt result in a better product.
Given that code, of any conventional programming language, is a significant part of the modelās training dataset, and that it is a language they can reason about, surely, they are better at writing code?
I know that is not a terribly compelling argument. But put quite simply, I think that asking an agent to write 1000s of lines of markdown rather than just asking it to write the code is a misuse of this technology.
āSuck eggsā is a phrase used in the idiom āDonāt teach your grandmother to suck eggs,ā which means you should not try to advise or instruct someone who is already an expert in a subject.
The whole Spec Kit process is designed to create a guided path for the agent to follow when implementing a feature.
This year we have seen a significant amount of research and product development effort being poured into coding agents. It has become one of the primary battle-grounds for commercialising this technology. In the last week weāve seen Gemini 3, GPT-5 Codex Max and Claude Opus 4.5 released. Reading the details of the releases, youāll find that they are all rapidly developing approaches that help their agents better analyse, plan and implement features. These may not be surfaced directly to the user, but they are all part of the internal workings as they battle for supremacy on the SWE-Bench, MITR and various other benchmarks.
Are a bunch of smart and detailed prompts created in an open source project really going to keep up with and exceed the performance of the latest foundation models? It feels like re-reading Harperās blog post which I referenced earlier. It becomes dated almost immediately.
I think it is quite clear that I donāt think SDD works for my workflow, on this specific application. And I am willing to generalise, I donāt think Iād reach for SDD regardless of the context. However, while I feel like Iāve given it a good shot, it would be churlish of me to dismiss it entirely.
I do want to provide a few caveats and potential words of defence.
From reading the Spec Kit forums, there are quite a few active discussions around how you evolve specs, incorporate bug fixes, and other elements of the process. It is clear that Spec Kit is immature and answers to these questions may emerge over time.
Furthermore, I am willing to entertain the possibility that I am simply just using it wrong! In Brigittaās post (referenced earlier), she stated that āI wasnāt quite sure what size of problem to use it forā, neither am I.
I also wonder whether I am the target audience for this tool? In this specific context (developing my hobby app), I am a vibe engineer. While I am using AI to write most of the code, I am the architect, and lead engineer. I review, I correct mistakes, refactor and generally guide it towards a pattern I am happy with. Perhaps Spec Kit is for the vibe coders? For the product owners? For the people who donāt write or fully understand code?
While the overall tone of this blog post is quite negative and heavily criticises SDD, or at least the Spec Kit flavour of it, I do still think there is genuine value in SDD.
I view it in much the same way as Test Driven Development, Vibe Coding and Extreme Programming. There is a long history of more radical approaches to software development. Iām a pragmatist. I wouldnāt advocate implementing these approaches in their purest form, however, there is much to be learnt by giving them a go, by talking about them, pondering, considering and critiquing.
I see Specification Driven Development as an interesting concept, a radical idea, a subject of lively, interesting and healthy debate. But I donāt consider it a viable process, at least not in its purest form, as exemplified by Spec Kit.
For now, the fastest path is still iterative prompting and review, not industrialised specification pipelines.
One of the key responsibilities of a software architect is communicating architecture effectively. Architecture never exists in a vacuum ā it exists to align people, guide decisions, and help teams move toward the same goals. Whether youāre sketching a new system or explaining how existing components fit together, effective communication means helping others understand the structure, purpose, and implications of the architecture.
While itās possible to describe a system using a wall of text, itās rarely the best way. Architecture is complex, and most of the time the fastest and clearest way to convey it is visually. Diagrams help people see relationships, boundaries, and flows at a glance.
But before drawing anything, itās important to pause and ask two fundamental questions:
Many of us have been in meetings gathered around a whiteboard, sketching out boxes and arrows to explore ideas. These ad-hoc diagrams are great for rapid ideation ā they help teams align quickly ā but they are rarely useful outside that moment. As systems grow in complexity, sketches alone arenāt enough. We need a clearer understanding of the underlying structure we are trying to represent.
This brings us to an important distinction: modelling versus diagramming.
A diagram is therefore a view onto the model. It highlights certain elements while omitting others, depending on the story we need to tell.
But different people care about different stories. In any organisation, systems may have multiple users, internal and external teams, logical components, and supporting infrastructure. The relationships between these layers quickly become complex.
Trying to show everything in one diagram would be overwhelming. A CTO may care about high-level system interactions, while a security officer needs low-level networking details. Each stakeholder has different concerns ā or viewpoints ā and no single view can satisfy them all.
This is why we create multiple views, each shaped by a specific viewpoint and tailored to its audience. The model remains the one source of truth; each view shows only the part that matters for a given concern.
As an analogy, think about the architectural plans of a house. A single structural model can be used to produce:
Each plan is a view of the same underlying structure, created from a different viewpoint. If an architect moves a wall but the electrical and plumbing plans arenāt updated to reflect it, the result would be chaos!
Software architecture works the same way. The model holds the truth about the system; diagrams are purposeful views that help different people understand and make decisions about it.
When deciding how to model and create views, thereās no one-size-fits-all solution. Instead, thereās a spectrum - ranging from highly structured, formal modelling approaches to informal, free-form sketches. Each approach has its place depending on the context, audience, and longevity of the diagram.
At one end of the spectrum, we have heavyweight and structured approaches such as UML and ArchiMate. These approaches enforce strict semantics and provide a rich modelling language. They are often used in enterprise-scale architecture where consistency, traceability, and alignment with frameworks like TOGAF are required. The trade-off is that they require significant effort to maintain, have steep learning curves, and may not be accessible to non-architects.
In the middle, we find lightweight but structured approaches such as the C4 Model, which emphasise maintaining a consistent underlying model, but with far less formality. This supports clarity and coherence without the prescriptiveness of a full modelling language, and makes producing and evolving views far more manageable.
Cloud diagrams that use AWS or Azure icon sets also sit broadly in this category. They offer a standardised visual vocabulary that improves clarity and consistency, but they stop short of providing a true modelling approach.
At the far end of the spectrum, we have lightweight and unstructured approaches - free-form diagrams created on whiteboards, both physical and virtual. These are ideal for exploring or conveying an idea quickly and for collaborative workshops. They are fast, intuitive, and unconstrained, but they lack an underlying model and can quickly become inconsistent as systems evolve.
Choosing the right approach is always a trade-off between consistency, governance, and ease of use. It depends on how long the diagram will live, who will maintain it, and how complex the system is.

Once youāve decided how structured your approach needs to be, the next step is choosing the right tools. Broadly speaking, diagramming and modelling tools can be grouped under the following categories.
Tools like PlantUML, Mermaid, and Structurizr DSL allow you to define diagrams using text. These are ideal for teams who treat architecture like code ā enabling version control, CI/CD integration, and automated documentation.
They work particularly well when architecture needs to evolve alongside code. Diagrams can live in the same repository, be reviewed like any other code change, and even be generated automatically as part of a pipeline. The trade-off is that layout control can be limited, and the output may lack the polish of a hand-crafted diagram.
@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
Person(user, "End User", "Calls the public API")
System_Boundary(sys, "Serverless API System") {
Container(apiGw, "API Gateway", "Amazon API Gateway", "Entry point for HTTPS clients")
Container(lambdaFn, "Lambda Function", "AWS Lambda", "Executes business logic for incoming requests")
ContainerDb(backend, "Backend Service", "DynamoDB", "Stores and retrieves application data")
}
Rel(user, apiGw, "Invokes API", "HTTPS/JSON")
Rel_R(apiGw, lambdaFn, "Triggers", "Lambda integration")
Rel_D(lambdaFn, backend, "Reads/Writes data", "SDK / JDBC")
@enduml

Enterprise tools such as Archi, Sparx Enterprise Architect, and Visual Paradigm focus on maintaining a central model and generating views from it. This ensures consistency across diagrams and supports traceability ā linking requirements to architecture and even to implementation.
These tools are powerful but require discipline and effort to keep up to date. They are best suited for large organisations with formal architecture governance or regulated environments where long-lived models are essential.

Tools like Lucidchart, Miro and draw.io prioritise collaboration and simplicity. They mimic the experience of sketching on a whiteboard but add features like templates, real-time collaboration, and cloud storage.
These tools are great for workshops and stakeholder engagement, but they lack an underlying model. As a result, they can become inconsistent and hard to maintain as systems grow.

Tools like Cloudcraft, Hava, and AWS Workload Discovery integrate with live cloud environments to automatically generate diagrams. These tools reflect the actual state of deployed systems, which is invaluable for audits, onboarding, troubleshooting, and operational visibility. Many can also ingest Infrastructure as Code (e.g., Terraform) to visualise deployments directly from source.
Although automation makes these diagrams quick to produce, it also constrains them. Because they mirror the raw cloud resources exactly as they exist, there is very little scope for layout, grouping, or abstraction. As a result, they are not well suited to future-state design, architectural storytelling, or conveying logical intent rather than physical infrastructure.

LLMs introduce a new angle: because diagrams can be defined as code, we can now generate formats such as Mermaid or PlantUML directly from natural-language descriptions. This makes it much faster to produce early drafts and explore different ways of expressing a model.
But this approach has a fundamental limitation: diagrams are spatial and visual, while LLMs predict text. An LLM can create valid syntax, but it cannot reliably judge whether a diagram will be readable, balanced, or visually coherent.
To address this gap, AI features are emerging inside visual diagramming tools themselves ā for example Miro, Lucidchart, and dedicated tools like Eraser. These may be able to more intelligently integrate with layout engines and can prompt a user to clarify their intent; producing more coherent visuals while still keeping a human in the loop. AI tools are also being integrated directly into code-bases to generate documentation, including diagrams - such as Googleās CodeWiki or Devinās DeepWiki.
LLMs also have potential to support the modelling process more directly. By connecting to codebases or live infrastructure, they can answer natural-language questions (āWhich services call this API?ā), help infer relationships, and assist in keeping architectural models aligned with the real system.
AI-assisted diagramming is most effective as augmentation rather than automation. By combining automated insights with natural-language interaction, these tools have the potential to reduce the effort of creating and maintaining diagrams ā while architects still provide the intent, abstraction, and viewpoint needed for effective communication.
Effective architecture communication is less about the tool and more about the thinking behind it. A clear model provides the structure; viewpoints help us understand what different audiences care about; and diagrams turn those ideas into views that tell the right story at the right level of abstraction.
There is a wide spectrum of approaches available ā from formal modelling languages like UML and ArchiMate, to lightweight frameworks such as C4, to completely free-form sketches on a whiteboard. Each has strengths depending on the context, longevity, and the decisions it needs to support.
But it isnāt the tooling that communicates architecture effectively ā architects do. What matters most is clarity of intent, and ensuring that whatever we produce, with whichever tool we choose, genuinely reflects that intent.
Andrew Oliver (via Hacker News):
Applets are officially, completely removed from Java 26, coming in March of 2026. This brings to an official end the era of applets, which began in 1996. However, for years it has been possible to build modern, interactive web pages in Java without needing applets or plugins. TeaVM provides fast, performant, and lightweight tooling to transpile Java to run natively in the browser.
Previously: