/ projects / alphaforge

AlphaForge

experimental

Evolutionary optimization over a typed prompt-bundle surface, scored against point-in-time equity cohorts with deterministic regime-conditional screening and tax-aware exit policies. The system mutates prompt bundles, screener configurations, and ranking weights — then backtests the survivors to find durable after-tax excess return.

Inspired by Karpathy's autoresearch (autonomous experimentation loop) and Atlas-GIC (multi-agent prompt optimization via Sharpe). Diverges from both: this is a governed IC workflow where structural risk gates — not prompt mutation alone — are the discipline mechanism.

Codex MCP Python SEC EDGAR FRED yfinance

The Forge

AlphaForge treats investment research prompts as parameters to be optimized. Each "prompt bundle" is a typed configuration: persona overlays, ranking weights, screener template, guidance style, and tool budgets. The system mutates these variables and scores the output against historical equity cohorts — keeping only the survivors.

# The evolutionary loop
seeds      = load_prompt_bundles()          # 3 seed configurations
universe   = build_universe_as_of(date)     # PIT eligible equities
shortlist  = screen_equities(universe)       # deterministic regime screener
research   = run_research(shortlist, bundle) # LLM thesis + filing analysis
scored     = score_cohorts(research, SPY)    # after-tax excess vs benchmark
survivors  = gate(scored, primary_36m > 0)   # hard search gate
mutations  = mutate(survivors)               # weight shifts, template swaps
# repeat until validation clears on untouched dates

Three Layers

Deterministic Screener

5 regime-conditional templates (balanced, recovery, defensive, inflation, auto-cycle). SEC company facts + yfinance prices + FRED macro series. Factor scoring: quality resilience, balance sheet strength, valuation, capital allocation, price confirmation. The screener is part of the mutation surface — templates and thresholds are tunable.

LLM Research Worker

Isolated Codex workspace with only the backtest MCP surface. Reads SEC filings, builds regime context from FRED/market data, ranks candidates, selects a portfolio with sell policies, and produces live-guidance actions. Every tool call is taped for reproducibility and audit.

Mechanical Scorer

Backtests each portfolio against SPY at 12m/36m/60m horizons with 1x/2x/3x transaction cost assumptions. Tax-aware (20% long-term after 366d hold). Deterministic sell policies: trailing stop, 200dma break, or hold-to-horizon. Search survivors must clear positive 36m after-tax excess return.

Current State

AlphaForge is in active development. The infrastructure is built and validated, but the system has not yet produced a promotable prompt bundle. This is an honest accounting of where things stand.

DONE Deterministic screener with 5 regime-conditional templates and bounded mutation surface

DONE Best-effort PIT universe builder from SEC ticker master + historical price/liquidity screens

DONE Tax-aware scorer with deterministic sell policies (trailing stop, 200dma, hold-to-horizon)

DONE Hard search gate requiring positive 36m after-tax excess return before validation

DONE Promotion contract: complete validation coverage required, staged artifacts for manual review

DONE Live per-worker transcripts, taped tool calls, and reproducible batch artifacts

WIP Multi-generation overnight campaign — infrastructure stable, no bundle has survived the full search + validation gate yet

WIP Seed bundle diversification — current seeds lean toward quality/recovery cyclicals that underperform at 36m

TODO Switch inner research worker from Codex to Claude Agent SDK for lower startup overhead

TODO Survivorship-clean historical universe (current builder uses active SEC master, not delisted securities)

Mutation Surface

Variable	Range
ranking_weights	5 factors, weight shifts of 0.10 between pairs
screen_template_id	5 regime templates (balanced, recovery, defensive, inflation, auto-cycle)
pm_prompt_variant	3 styles (deployment discipline, quality compounder, drawdown guard)
ra_prompt_variant	3 styles (quality first, filing delta first, regime aware)
guidance_style	3 styles (staged accumulator, watchlist first, high conviction only)
theta_screen_overrides	min_composite_score [0.55-0.70], max_candidates [12-20]
shortlist_n / portfolio_k	[8, 12, 16] / [3, 5]

Why "AlphaForge"

The name reflects the iterative, mechanical nature of the optimization loop. Prompt bundles are forged through repeated testing against historical equity cohorts — not generated by a single model pass or hand-tuned by a human analyst. Most bundles are expected to fail. The system is designed around attrition, not genius.

This is not autonomous trading. AlphaForge produces candidate prompt/screener configurations that are staged for manual review and forward shadowing before any live use. The human remains the decision-maker; the forge just searches a larger configuration space than manual tuning allows.