Coliseum

Prediction markets are efficient but not perfectly so. Sometimes prices are wrong. Sometimes provably wrong.

In any P2P platform, there is always a delay between true pricing and market corrections. That lag is free money.

Coliseum is a five agent autonomous trading system that scans thousands of Kalshi markets, identifies opportunities, and executes trades while I'm asleep.

The Agents

There are five agents. Each has exactly one job.

Scout runs every hour and scans the entire Kalshi market catalog using GPT-5.2. It filters by volume, spread, and closing window, then researches each candidate with targeted queries. If a conference leader just lost their key vote but the market still prices them at 70% to pass, Scout finds it. It outputs opportunity files with rationale, confidence levels, and source URLs.

Researcher takes those opportunities and goes deeper. It's not allowed to make a trade recommendation. It formulates 2-4 highly targeted research questions per opportunity, runs separate web searches, synthesizes everything with embedded citations, and appends findings to the opportunity file.

Recommender reads the research and runs the numbers through a fixed set of tool calls, just structured inputs and a deterministic output. Either the edge is there or it isn't. BUY YES, BUY NO, or ABSTAIN.

Trader is the final step. It checks portfolio cash, fetches the live Kalshi orderbook to verify the price hasn't moved since the recommendation, and calculates slippage. If everything clears: limit order. No market orders, ever. The execution strategy is: place → wait → reprice more aggressively → repeat up to 3 times → cancel. It sends a Telegram alert on every decision.

Guardian runs every 15 minutes in the background. No LLM. Pure Python. It pulls the live portfolio from Kalshi, reconciles open positions against a memory log, and marks closed trades with their exit prices and P&L.

Memory

Every agent writes to the same data layer. Scout saves a markdown file per opportunity. Researcher appends its synthesis. Recommender appends the edge calculation. Trader logs the execution result.

There's also a memory.md file that tracks every market the system has touched: status (PENDING → EXECUTED → CLOSED), trade details, P&L. Scout reads this before each scan and skips anything already in memory.

Two Strategies

EDGE: Markets with 4-10 days until close where the price appears wrong by at least 5%. The system enters early and exits when the market self-corrects. It's not betting on the outcome, it's betting the market will reprice.

SURE THING: Markets where one side is at 92-96% and closing within 24 hours. Hold to resolution and capture the remaining 4-8%. Scout runs a full Risk Assessment Checklist before selecting any of these ensuring there are no formal appeals, clear resolution source, stable inputs. Only negligible or low reversal risk makes the cut.