AI in Dev Workflow: Balance and Control

While building Poseidon, a custom CTF focused container orchestrator and deploying OrcaCTF on AWS, Sumit asked how I incorporate AI into my development workflow. As I explained my process, I realized I’ve subconsciously developed a systematic framework which keeps me in control while making the most of AI’s strengths.

This post isn’t about prompt engineering or which model is the best. It’s about having a methodology to ensure you use AI to accelerate your workflow without making you dependent on it or producing unmaintainable code.

My 4-phase development framework

Phase 1: Solo Architecture Thinking ( No AI yet )

Before touching any AI tool, I force a period of clean room thinking.

Take time and think through the requirements
Research options using primary documentation
Outline a couple of architectural approaches
Document trade-offs for each architecture

Example from Poseidon:

“Do I use AWS Lambda ( serverless, but 15 min execution limit ), Fargate ( managed containers, but complex per-container routing ) or build a custom orchestrator?”

Why this matters: AI defaults to the "average" solution found in its training data. It lacks the specific context of your constraints (budget, timeline, team expertise). Only you can critically evaluate these trade-offs. Skipping this step leads to generic, sub-optimal architectures.

Once I have a satisfying outline, I treat the AI (Claude is my preference here) as a "Red Team" or a Critical Reviewer. The goal isn't "tell me how to build," but "tell me where this breaks."

My validation checklist:

Edge case detection: I am choosing X over Y because of Z. What failure modes am I not considering?
Scale Analysis: Here’s my service mesh design. What breaks first at 10k concurrent users?
Operational blindspots: I’m planning to use Consul for service discovery. What are the known operational headaches?

The Goal: A theoretically stress-tested tech stack and infrastructure approach I am confident in, validated against patterns I might have missed.

Phase 3: Top-down code skeleton ( my core method )

This is where my approach diverges from “just start coding”, or “ask AI to build it”

The Process:

Define the high-level interface: Start at the highest level of abstraction using strict typing

 async def request_instance(user_id: str, challenge_id: str) -> Instance:
     pass

Think through what this method needs:
- Check user’s rate limits
- Ensure there is no existing container associated with the user
- Select the least loaded worker for deploying the container on
- Register service to Consul
- Save state to redis
- Set up routing to specific container
- Return a well-defined Instance object

Create “contract” stubs:

 async def select_best_worker() -> WorkerNode:
     return WorkerNode(node_id = "stub", address = "stub")

 async def request_spawn_container_on_worker(
     worker : WorkerNode,
     challenge_image : str
 ) -> Container:
     return Container(id="stub",ip="stub",port=8080)

 async def register_service_in_consul(
     container: Container,
     user_request: RequestChallenge
 ) -> bool:
     return True

Go deeper recursively: Each stub function gets broken down recursively into its own sub-functions until I hit the system boundaries (external APIs like Docker SDK, Consul client, Redis, etc. )
Return dummy data in the correct shape: This is critical. Each stub returns properly typed data so the top level functions can “run” ( even if they do nothing real ).

Why this works:

Control: I define the flow of execution and data structures.
Isolation: Each function has a clear, single responsibility before implementation details muddy the waters.
Debuggability: I can "run" the system with stubs to verify the logic flow before writing a single line of real infrastructure code.

Phase 4: Bottom-up implementation ( AI as pair programmer )

With the interfaces defined, I switch to implementation. I work from the bottom up; starting with the functions that touch external APIs.

This is where AI shines. Since I have isolated the logic into a single stub, I can ask the AI to "Implement this specific function using the Docker SDK." The context is contained, preventing hallucinations. Although, it is still important to review all logic used in the AI-generated code to prevent any security compromises, and to catch any unknown assumptions the AI might’ve made.

What changes during implementation:

After each sprint ( implementing a single layer ), the dummy return values in the top-level function are replaced with real data. The function signature usually stays the same, but I occasionally realize that I need additional data fields.

Example

# Initial Stub
async def provision_instance(challenge_id: str, user_id: str) -> Instance:
    worker = await select_best_worker()  # Dummy at first
    container = await spawn_container(worker, challenge_id)  # Dummy
    await register_service(container)  # Dummy
    return Instance(id="stub", hostname="stub")  # Dummy

# After implementing select_best_worker()
async def provision_instance(challenge_id: str, user_id: str) -> Instance:
    worker = await select_best_worker()  # Now returns real WorkerNode
    container = await spawn_container(worker, challenge_id)  # Still dummy
    await register_service(container)  # Still dummy
    return Instance(id="stub", hostname="stub", worker=worker)  # Partially real

The Challenge: Data Shape Consistency

A major risk in distributed systems is Schema Drift. In Poseidon, a "Worker" appears in multiple forms:

Redis: A JSON string.
Internal Logic: A Python Object.
API Response: A Pydantic model.
gRPC: A Protobuf message.

If you’re not careful, you end up with ad-hoc dictionaries {"id": ...} scattered everywhere. Debugging becomes a nightmare of key errors.

My current solution: Explicit return types annotations everywhere.

async def select_best_worker() -> WorkerNode: # forces me to return a WorkerNode
    # Can't accidentally return a dict or a string

This doesn’t solve the problem entirely, but it forces me to think about data consistency upfront rather than during debugging.

What I need to add:

Canonical Models & Conversion Boundaries

Canonical Form: A strict Pydantic model represents the entity within the application logic.
Boundaries: Data is immediately converted to the Canonical Form when it enters the system (e.g., from Redis or API) and only converted out at the last moment.

Python

# The Canonical Model (The Truth)
class WorkerNode(BaseModel):
    node_id: str
    address: IPv4Address
    load: int

# Boundary: Redis -> Canonical
def get_worker(id: str) -> WorkerNode:
    data = redis.get(id)
    return WorkerNode(**json.loads(data)) # Validation happens here

This ensures that my AI-assisted implementation code never has to guess the shape of the data. It always receives and returns the Canonical Model.

The Missing Piece: Robust Testing

I’ll be honest: I don't write enough tests. Like many solo projects, I rely heavily on manual verification, which works until it doesn't.

However, the "Top-Down" framework actually lays the perfect groundwork for a testing strategy I should be implementing. Because the system is built on isolated stubs and contracts, the path to robustness is clear, even if I haven't walked it yet:

The Plan for Unit Tests: Since select_best_worker() is an isolated stub, I can easily write a test that forces it to raise a NoResourcesAvailable error to see if the parent function handles it gracefully.
The Plan for Integration: I can mock the "System Boundary" functions (Docker/Consul) to test the orchestration logic without spinning up real infrastructure.

Right now, I am testing manually. But because the architecture is decoupled, adding these tests later won't require a rewrite; just discipline.

Why this Framework Works

Architectural Clarity: I understand the system because I designed it. Not the AI

AI as accelerator; not crutch : AI helps fill in the implementation details, not architectural decision

Debuggability: Top down structures + explicit types make it easier to trace failures

Incremental progress: Each sprint adds real functionality without breaking the overall structure.

**When this doesn’t work**

This framework isn’t universal.

While this framework is excellent for architecting complex distributed systems, infrastructure, or anything with too many moving parts, it’s unnecessary for exploratory work.

When you’re simply trying to see whether an idea is viable, forcing a top-down process is self-inflicted pain. Build the crude prototype first; confirm the thing even deserves oxygen. Likewise, in well-trodden domains, this methodology adds little value. It wastes time that could be spent actually shipping something. You don’t need a grand design philosophy to churn out yet another CRUD app.

	Simple Domain	Complex Domain
High Stakes	May be an overkill ( but safe )	Use the Framework
Low Stakes ( Learning )	Overkill, Just hack it	Use Framework (learn deeply)

TLDR: If the cost of failure is high, plan the interface. If the cost of failure is low, just build it

Lessons I learnt:

Start architectural conversations with AI, not direct implementation:
- “I’m considering X vs Y for Z reason. What am I missing?”
- Not: “Write me a container orchestrator”
Spend time designing the system before churning out actual code:
- What are the 3-5 functions I need at the top level of abstraction?
- What do they return and expect as parameters?
- What do they need from each other?
Use type annotations religiously:
- Forces consistent data shapes
- Makes AI suggestions more accurate
- Catches potential bugs at design time
Design top-down, implement bottom-up:
- Start designing with your functions acting as providers and work your way down to the functions which consume other APIs
- Implement functions talking to external APIs first and work your way up replacing stubs with real functions incrementally.
Review your data models before implementing:
- How many ways are you representing the same entity?
- Can you reduce it to one canonical form?

Conclusion

AI is an incredible accelerator for building complex systems, but only if you stay in the driver's seat.

My framework keeps architectural decisions and system invariants in my control, while delegating the implementation details to the AI. The result is a system I can operate, debug, and evolve without guessing.

Related: Building Poseidon Part 1 | Part 2 | Part 3

How I use AI in my Dev Workflow (without losing control)

My 4-phase development framework

Phase 1: Solo Architecture Thinking ( No AI yet )

Phase 2: Architectural Validation & Refinement ( Enter AI )

Phase 3: Top-down code skeleton ( my core method )

Phase 4: Bottom-up implementation ( AI as pair programmer )

The Challenge: Data Shape Consistency

The Missing Piece: Robust Testing

Why this Framework Works

**When this doesn’t work**

Lessons I learnt:

Conclusion

Comments

More from this blog

Building Poseidon #3: Control Plane vs Data Plane

Building Poseidon #2: The Master-Worker Dance

Building Poseidon #1: Why we're not using Kubernetes

Terrier CTF [Part 1]: Network Reconnaissance to SSTI: The Methodology Beneath the Exploit

Command Palette

My 4-phase development framework

Phase 1: Solo Architecture Thinking ( No AI yet )

Phase 2: Architectural Validation & Refinement ( Enter AI )

Phase 3: Top-down code skeleton ( my core method )

Phase 4: Bottom-up implementation ( AI as pair programmer )

The Challenge: Data Shape Consistency

The Missing Piece: Robust Testing

Why this Framework Works

When this doesn’t work

Lessons I learnt:

Conclusion

Comments

More from this blog

**When this doesn’t work**