Guide to AI assisted development

Introduction

AI coding assistants behave like overly enthusiastic junior developers: fast, broadly knowledgeable, and responsive, but lacking deep understanding of the specific project they are working on. They generate plausible code quickly, but without clear constraints they are likely to introduce bugs, make unnecessary changes, overlook edge cases and misinterpret requirements. Their speed is an advantage only if it is controlled.

This guide outlines one method for using AI assistants effectively: the Explore – Plan – Code – Verify workflow. The workflow ensures that speed does not come at the cost of quality by breaking work into atomic tasks, each with clear acceptance criteria, a definition of done, and verification steps. The core principle is simple: all changes must be reviewed and tested by a human.

Failure modes: why a workflow is required

AI coding assistants produce plausible-looking code quickly, but their output often contains subtle flaws. These include hallucinated or outdated APIs, incorrect configuration flags, or framework behaviour that no longer applies. They may also introduce scope creep (unrelated refactors, formatting changes, or dependency additions) that distract from the task’s core objective. Even when the code appears correct, it may overlook edge cases, non-functional requirements, or integration boundaries, leading to failures in production.

Tests generated by AI can be equally problematic. They may pass while asserting the wrong behaviour or cover only the happy path, leaving critical failure modes untested. Another common issue is the bug-fixing loop: repeated iterations of small patches without new information or a change in approach, resulting in wasted effort and increasingly convoluted code. These failures are not random; they follow predictable patterns.

A workflow is required to mitigate these risks. Without one, the assistant’s speed becomes a liability, introducing errors faster than they can be caught. By imposing structure, such as breaking work into verifiable tasks and enforcing explicit checks, developers can harness the assistant’s speed while maintaining control over quality. The goal is not to eliminate errors entirely, but to catch them early, before they propagate into the codebase.

The Explore – Plan – Code – Verify workflow

The Explore – Plan – Code – Verify loop provides a structured approach to AI-assisted development, ensuring that speed and quality are balanced. Each phase serves a distinct purpose and produces specific outputs, creating a repeatable process for delivering reliable changes.

flowchart TB
    A(["Bug, feature or project"]) --> B["**Explore**"]
    B --> C["Accepted solution"]
    C --> D["**Plan**"]
    D --> E1["Phase breakdown
    (for larger pieces of work)"]
    E1 --> E2["Task specification
    (for current phase)"]
    E2 --> F["**Code**"]
    F --> G["Code changes"]
    G --> H["**Verify**"]
    H --> I{"Acceptance 
    criteria met?"}
    I -- No --> F
    I -- Yes --> J["Commit changes"]
    J --> K{"Are there 
    more tasks?"}
    K -- No --> L{"Are there 
    more phases?"}
    K -- Yes --> F
    L -- No --> M(["Complete"])
    L -- Yes --> D

    C@{ shape: lean-r}
    E1@{ shape: lean-r}
    E2@{ shape: lean-r}
    G@{ shape: lean-r}
    J@{ shape: trap-t}

Explore: reduce ambiguity and confirm constraints

The Explore phase is about eliminating uncertainty before writing code. It begins with gathering inputs: the problem statement, expected behavior, constraints (such as versions, environments, or deployment requirements), and relevant code paths. These inputs form the foundation for clarifying the task and identifying potential risks. The goal is to answer critical questions upfront: What does success look like? What are the non-negotiable constraints? Which parts of the codebase are affected?

The output of this phase is a set of clarified requirements, options, and trade-offs. It should also highlight unknowns that require confirmation, such as unclear specifications or dependencies. The aim is minimal-change thinking: prefer the smallest correct intervention that achieves the goal. This reduces complexity and limits the surface area for errors.

Explore is not about designing the solution in detail. Instead, it ensures that the problem is well-defined and that the constraints are understood. This phase prevents wasted effort by catching ambiguities early, before they lead to incorrect or overly broad implementations. The result is a focused, actionable task that can be planned and executed with confidence.

Plan: atomic tasks with acceptance criteria and definition of done

Planning in AI-assisted development means breaking work into atomic tasks: small, independently verifiable units that can be reviewed, tested, and reverted without side effects. An atomic task is operationally defined by three properties: it must be small enough to review confidently, independently verifiable, and revertible without disrupting other work. This granularity ensures that the assistant’s output remains focused and manageable.

Each task should follow a structured format to eliminate ambiguity:

Scope and non-goals: What the task includes and, critically, what it excludes. This prevents scope creep by explicitly ruling out unrelated changes.
Acceptance criteria: Clear, testable conditions that define success. These should cover functional requirements, edge cases, and non-functional constraints.
Definition of done: A checklist of completion requirements, such as code reviews, test coverage, and documentation updates.
Automated tests: Specific tests to add or update, including unit, integration, or regression tests.
Manual test steps: Step-by-step instructions for validating behavior that cannot be automated, such as UI interactions or cross-browser compatibility.
Rollback note: How to revert the change if something goes wrong, including any database migrations or configuration updates.
Risk level: An assessment of the task’s complexity and potential impact, which determines the rigor of verification.

Planning depth should match the task’s proximity to execution. The next milestone should be fully detailed, while later milestones can remain coarse, aligning with agile principles. Guardrails are essential: maintain a "do not touch" list to avoid drive-by refactors, limit dependency additions, and explicitly avoid unrelated changes. This discipline ensures that the assistant’s speed is directed toward meaningful work, not distractions. The result is a clear, actionable plan that minimizes risk while maximizing efficiency.

Code: implement the task, but supervise the assistant

The Code phase is where the planned task is implemented, but the assistant’s output must be actively supervised. The goal is to execute the task as defined, without introducing unnecessary changes or deviations. Start by providing the assistant with the task brief, including scope, constraints, and relevant code paths. This context ensures that the generated code aligns with the plan and avoids common pitfalls like hallucinated APIs or incorrect assumptions.

Supervise the output for scope creep. The assistant may suggest unrelated refactors, broad renames, or style changes that are not part of the task. Reject these additions unless they are explicitly required. Similarly, watch for bug-fixing loops: if the assistant produces repeated patches without progress, return to the Explore phase. Gather new evidence, refine the hypothesis, and adjust the plan before continuing.

Keep changes reviewable. The diff should be minimal, focused, and accompanied by tests. Commits should be logical and self-contained, making it easy to verify that the implementation matches the plan. If the task involves complex logic, break it into smaller steps and validate each one before proceeding. The assistant’s role is to accelerate implementation, not to replace human oversight. The developer remains responsible for ensuring that the code meets the task’s requirements and adheres to project standards.

Verify: layered checks that earn trust

Verification is the final gate in the workflow, ensuring that the assistant’s output meets all requirements and does not introduce hidden risks. This phase applies multiple layers of checks, each designed to catch different classes of errors and build confidence in the change.

Automated checks form the first layer. These include unit and integration tests, linting, type checks, and build steps. They confirm that the code behaves as expected in controlled scenarios and adheres to technical standards. Behavioural checks follow, using manual test scripts to validate real-world interactions, edge cases, and failure paths. For web applications, this might involve cross-browser testing or device compatibility checks.

Operational checks ensure the change is production-ready. Review error handling, logging, metrics, configuration, and any required migrations or rollback procedures. Security checks address input validation, output encoding, authorisation, secrets management, logging hygiene, and dependency security. These checks are non-negotiable, regardless of task size.

To streamline verification, use a compact checklist tied to the task’s risk level. Low-risk changes may require only basic checks, while high-risk changes demand thorough validation. The key principle is this: verification effort scales with risk, but it is never optional. By applying these layers systematically, the workflow ensures that the assistant’s speed does not compromise reliability. The result is code that is not only fast to produce but also trustworthy in production.

Context management: control what the assistant sees and assumes

Context is the information the assistant uses to generate code, and its quality directly determines the reliability of the output. Low-quality context produces plausible, but incorrect results, leading to wasted effort and technical debt. Context includes the problem statement, relevant code paths, constraints, and assumptions about the system’s behaviour. Without explicit guidance, the assistant fills gaps with generic or incorrect assumptions, which can introduce subtle bugs or unnecessary complexity.

To manage context effectively, create a task brief for each atomic task. This brief should include:

Scope and non-goals: What the task includes and excludes.
Constraints: Versions, environments, dependencies, and deployment requirements.
Relevant files and commands: Specific paths, build commands, or test scripts to use.
Assumptions: Explicit notes about API behaviour, user permissions, or other system-specific details.

Repository guidance files, such as AGENTS.md or CLAUDE.md, can standardize context across tasks. These files define coding standards, boundaries, review expectations, and common commands. By referencing them in task briefs, you ensure consistency and reduce the risk of misaligned output.

The goal is to provide the assistant with the minimal, precise context required for the task. Too much context leads to confusion or scope creep; too little results in incorrect assumptions. Striking this balance ensures that the assistant’s speed is harnessed productively, not spent on rework or debugging.

Scaling the workflow: bugfix vs feature vs project

The Explore – Plan – Code – Verify loop scales to different types of work, but its core principles remain unchanged. The difference lies in the depth of each phase and the rigour of the gates.

For bugfixes, the workflow is streamlined. Explore focuses on reproducing the issue and identifying the root cause, while Plan defines a minimal, targeted fix. Verification is strict but narrow, ensuring the bug is resolved without introducing regressions. Pull requests should be tiny, avoiding broad refactors or unrelated changes. The goal is speed, but only if the fix is correct and reversible.

For features, the workflow expands. Explore and Plan involve more upfront work: gathering requirements, evaluating trade-offs, and breaking the feature into staged deliverables. Feature flags may be used to decouple deployment from release, allowing for incremental validation. Verification includes more manual testing and stakeholder checks to confirm the feature meets user needs. The focus shifts from speed to correctness and alignment with long-term goals.

For projects, the workflow becomes comprehensive. Discovery spikes and architectural decision records (ADRs) precede implementation, ensuring the project’s direction is sound. Migration and rollout planning are explicit, with stronger review and verification gates to manage risk. The loop runs at multiple levels: high-level milestones are broken into atomic tasks, each with its own Explore, Plan, Code, and Verify cycle. The invariants remain non-negotiable, but the scale demands more coordination and documentation.

In all cases, the workflow ensures that AI-assisted development remains controlled, predictable, and aligned with project goals. The assistant’s speed is harnessed productively, but human judgement remains the final authority.

Conclusion: speed with a human in the loop

AI coding assistants increase throughput, but they do not change accountability. They can generate code quickly, suggest solutions, and automate repetitive tasks, but the responsibility for correctness, security, and alignment with project goals remains with the human developer. Speed is only valuable if it serves these priorities.

The practical recipe for safe AI-assisted development is straightforward: atomic tasks, explicit verification, and rigorous review. Break work into small, independently verifiable units. Define acceptance criteria and a definition of done for each task. Apply layered checks—automated tests, manual validation, operational reviews—to ensure the output meets requirements. Never bypass human review, especially for critical changes.

The non-negotiable principle is this: a human must remain in the loop for requirements, boundaries, code review, and final verification. The assistant accelerates the process, but it does not replace judgement. Used well, it can deliver routine engineering work faster and with fewer errors. Used poorly, it can introduce technical debt just as quickly. The difference lies in the workflow, and in the discipline to enforce it.