AlphaForge
experimentalEvolutionary optimization over a typed prompt-bundle surface, scored against point-in-time equity cohorts with deterministic regime-conditional screening and tax-aware exit policies. The system mutates prompt bundles, screener configurations, and ranking weights — then backtests the survivors to find durable after-tax excess return.
Inspired by Karpathy's autoresearch (autonomous experimentation loop) and Atlas-GIC (multi-agent prompt optimization via Sharpe). Diverges from both: this is a governed IC workflow where structural risk gates — not prompt mutation alone — are the discipline mechanism.
The Forge
AlphaForge treats investment research prompts as parameters to be optimized. Each "prompt bundle" is a typed configuration: persona overlays, ranking weights, screener template, guidance style, and tool budgets. The system mutates these variables and scores the output against historical equity cohorts — keeping only the survivors.
# The evolutionary loop
seeds = load_prompt_bundles() # 3 seed configurations
universe = build_universe_as_of(date) # PIT eligible equities
shortlist = screen_equities(universe) # deterministic regime screener
research = run_research(shortlist, bundle) # LLM thesis + filing analysis
scored = score_cohorts(research, SPY) # after-tax excess vs benchmark
survivors = gate(scored, primary_36m > 0) # hard search gate
mutations = mutate(survivors) # weight shifts, template swaps
# repeat until validation clears on untouched dates Three Layers
Deterministic Screener
5 regime-conditional templates (balanced, recovery, defensive, inflation, auto-cycle). SEC company facts + yfinance prices + FRED macro series. Factor scoring: quality resilience, balance sheet strength, valuation, capital allocation, price confirmation. The screener is part of the mutation surface — templates and thresholds are tunable.
LLM Research Worker
Isolated Codex workspace with only the backtest MCP surface. Reads SEC filings, builds regime context from FRED/market data, ranks candidates, selects a portfolio with sell policies, and produces live-guidance actions. Every tool call is taped for reproducibility and audit.
Mechanical Scorer
Backtests each portfolio against SPY at 12m/36m/60m horizons with 1x/2x/3x transaction cost assumptions. Tax-aware (20% long-term after 366d hold). Deterministic sell policies: trailing stop, 200dma break, or hold-to-horizon. Search survivors must clear positive 36m after-tax excess return.
Current State
AlphaForge is in active development. The infrastructure is built and validated, but the system has not yet produced a promotable prompt bundle. This is an honest accounting of where things stand.
Mutation Surface
| Variable | Range |
|---|---|
| ranking_weights | 5 factors, weight shifts of 0.10 between pairs |
| screen_template_id | 5 regime templates (balanced, recovery, defensive, inflation, auto-cycle) |
| pm_prompt_variant | 3 styles (deployment discipline, quality compounder, drawdown guard) |
| ra_prompt_variant | 3 styles (quality first, filing delta first, regime aware) |
| guidance_style | 3 styles (staged accumulator, watchlist first, high conviction only) |
| theta_screen_overrides | min_composite_score [0.55-0.70], max_candidates [12-20] |
| shortlist_n / portfolio_k | [8, 12, 16] / [3, 5] |
Why "AlphaForge"
The name reflects the iterative, mechanical nature of the optimization loop. Prompt bundles are forged through repeated testing against historical equity cohorts — not generated by a single model pass or hand-tuned by a human analyst. Most bundles are expected to fail. The system is designed around attrition, not genius.
This is not autonomous trading. AlphaForge produces candidate prompt/screener configurations that are staged for manual review and forward shadowing before any live use. The human remains the decision-maker; the forge just searches a larger configuration space than manual tuning allows.