Introduction to Poseidon: CTF Orchestration Engine

💡

TL;DR: Built Poseidon, a custom container orchestrator for CTF challenges. Evaluated Lambda (runtime limits), Fargate (no per-container routing), and Kubernetes (complexity overkill). Went with a master-worker architecture using Docker SDK, Redis, and subdomain-based routing. Targets 15-20 concurrent users on AWS free tier. Open-sourcing soon.

The Problem: CTF Challenges Need Dynamic Infrastructure

Capture The Flag (CTF) competitions have evolved significantly. Modern challenges often require isolated, interactive environments where participants can exploit vulnerabilities, reverse engineer binaries, or manipulate web applications in real time. These aren’t static challenges; they are live systems that need to be spun up on demand, accessible over the internet and then torn down after use.

For my Cloud Computing course project, my team is building OrcaCTF. The vision was simple, students click a “Start Challenge” button, and within seconds they get a dedicated Docker container with a unique URL to connect to. The challenge runs for a set duration ( with extensions possible based on server load ), and then automatically cleans itself up.

Targeting 15-20 concurrent users initially, and an architecture that could scale far beyond that, we needed fine-grained control over container life-cycle, resource limits and networking. And we needed to do all of it on AWS free tier.

The question became, what’s the best way to orchestrate ephemeral Docker containers in the cloud?

Evaluating the Serverless Landscape

My first instinct was to leverage serverless services. After all, for a Cloud Computing course, surely AWS or Azure would have pre-built solutions for this exact use case, right?

AWS Lambda + API Gateway

The Promise: Serverless functions ( with Docker container support ) which scale to zero, pay only for what you use and handle thousands of concurrent requests

The Reality: Lambda has a maximum runtime of 15 minutes. CTF challenges may take from 30 minutes to several hours. Participants also usually need to step away and return to their environment. Lambda’s ephemeral nature and strict time limits made it a non-starter.

AWS Fargate

The Promise: Serverless containers, just specify your Docker image, and the Cloud handles the rest.

The Reality: These services, while really convenient, are designed for microservices architecture where instances are interchangeable and load balanced. They are meant for instance-agnostic workloads.

We needed the opposite: Instance-specific routing. Each user needs their own container accessible via a unique subdomain like a3f4b92c8d…orcactf.app. Fargate abstracts away containers behind load balancers. Getting traffic to a specific container would require complex workarounds, and the solution would be tightly coupled to AWS-specific networking constructs.

AWS App Runner

Similar story, great for deploying web services, but not so much for orchestrating user-specific ephemeral environments with custom networking requirements.

Why not Kubernetes or Docker Swarm?

The elephant in the room: Why not use battle tested orchestration platforms?

Kubernetes is incredibly powerful, but it’s also incredibly complex. Setting up a cluster, managing nodes, configuring ingress controllers, wrangling pods vs deployments vs services; it’s a steep learning curve. For a project that has to go from zero to prototype in a month, K8s felt like bringing a cargo ship to a river rafting trip.

Moreover, Kubernetes is general-purpose by design. It is meant to manage long-lived services, rolling deployments and complex distributed systems. Our use case is much simpler: spin up a container, keep it alive for a few hours at maximum, and then clean it up. We don’t need auto-healing deployments or blue-green rollouts. We need ephemeral, user-scoped container lifecycle management.

For our timeline and expertise level, Kubernetes might be an overkill for the current projected scale of the project.

Docker Swarm has a gentler learning curve, but it’s still designed for orchestrating services across clusters, not managing per-user container instances with custom routing.

Both options also add operational overhead: cluster management, control plane High Availability (HA), storage orchestration, network policies. I’d be spending more time fighting the orchestrator than actually building the platform.

The Case for Building Custom: Enter Poseidon

After evaluating the existing landscape, I was reaffirmed of my assumption that our use case is very niche, and the existing tools are not optimized for it. They’re designed for microservices ( many identical deployments behind a load balancer ) or long-lived services ( always running, scaled horizontally ). CTF challenges are neither.

So I decided to build Poseidon. Keeping with the marine theme of OrcaCTF, Poseidon is our ‘orchestrator of the deep’, a purpose-built engine which is designed to be simple and performant.

Core Requirements:

Absolute Control Over Container Lifecycle
- Spin up containers on-demand
- Enforce resource limits (CPU, memory, disk)
- Support custom timeouts with user-requested extensions
- Clean shutdown and cleanup
Instance-Specific Routing
- Each container gets a unique subdomain: <SHA256(instance_id)>.orcactf.app
- Traffic must route to the specific container, not a pool
- SSL/TLS termination for all subdomains
- Support for SSH, HTTP connections
Cloud-Agnostic Architecture
- Should work on AWS, Azure, GCP, or bare metal
- No vendor lock-in via proprietary services
- Portable enough to open-source and let others deploy
Observable & Debuggable
- Comprehensive logging and tracing
- Real-time metrics (Prometheus + Grafana)
- Easy visibility into what's happening under the hood
Cost-Effective
- Run on EC2 instances within AWS free tier
- Efficient resource utilization
- No per-container pricing overheads

Quick note: I'm building this in parallel with writing about it. Some details will evolve as we discover what works.

Architecture Philosophy: Simplicity over features

Poseidon follows a two-process model:

Master process: Runs alongside OrcaCTF’s other backend services. Handles API requests, maintains state in Redis, schedules containers across workers and monitors health.
Worker process: Runs on one or more EC2 instances. Receives commands from the master, spins up Docker containers, adds helpful labels to help in routing and reports status back.

The design is intentionally minimal. We’re not trying to compete with Kubernetes. We’re solving a specific problem with the simplest architecture that works.

The User Experience:

Let’s walk through the expected UX when a student wants to run a challenge:

Phase 1: Request ( User → Backend )

User clicks "Start Challenge" on the OrcaCTF frontend
Frontend calls backend API, which validates the user's token and permissions
Backend returns "pending", triggering a loading spinner

Phase 2: Orchestration ( Backend→Poseidon)

Backend calls Poseidon Master, passing challenge details and user ID
Master verifies the user doesn't have an active instance running
Master selects a worker (load balancing strategy TBD) and sends a work request
Worker spins up the Docker container, setting labels for routing
Worker performs health check, then reports success to Master
Master updates Redis state with the container's unique subdomain

Phase 3: Connection and cleanup ( User → Container )

Frontend polls backend until status changes to "ready"
User receives their unique URL: a3f4b92c8d...orcactf.app
Traffic is routed to the specific container based on subdomain
After timeout expires (or user terminates early), container is cleaned up

All of this happens quickly, with zero manual intervention.

Why Open-Source?

Poseidon isn’t just a course project, it’s designed to be a reusable module. We will be open-sourcing it because every university, club, or small company that wants to host an interactive lab shouldn’t have to reinvent the wheel.

What about existing platforms like CTFd?

If you’ve been in the CTF space, you’re likely thinking of platforms like CTFd and its ctfd-whale plugin. These are fantastic, all-in-one solutions for running a complete competition

However, the dynamic container components are often just that: plugins, tightly coupled to the main platform. They solve the problem for CTFd

My goal for this project is different; I wanted to build a decoupled, general-purpose orchestration engine (Poseidon) that could be used with any platform, whether it’s our OrcaCTF, a custom-built site, or even non-CTF uses like educational sandboxes or on-demand coding labs.

Poseidon is designed to be the engine, not the entire car. This separation of concerns is a core part of our design philosophy.

Follow along as I dive into:

The master process architecture and API design.
Dynamic subdomain routing.
Worker process and Docker SDK integration
Observability with Prometheus and Grafana
Deployment, auto-scaling and lessons learned
Security details like container isolation strategies

By the end, we’ll have a fully functional orchestration engine, a CTF platform and a deep understanding of how distributed systems work under the hood.

The ocean is deep. Let’s see how deep Poseidon can go.

This is Part-1 of the “Building Poseidon” series. Follow along as we build a custom container orchestration engine from scratch.

A Note on Terminology

Throughout this series, I use "I" when discussing Poseidon's architecture and implementation because I'm building this orchestration engine independently as my contribution to our team's larger OrcaCTF platform.

My teammates are handling other critical pieces: Piyush is designing the CTF challenges themselves, while Lakshya and Samanyu are building the frontend and integrating scoring systems. Poseidon is the infrastructure layer that makes those challenges accessible in isolated, on-demand environments.

This series documents my specific journey building that infrastructure: the decisions I made, the problems I hit, and the solutions I found.

Building Poseidon #1: Why we're not using Kubernetes

The Problem: CTF Challenges Need Dynamic Infrastructure