APE builds APE.

After two months of using APE to build APE, the lore collapsed, the agents sharpened, and the framework improved itself one cycle at a time. This is what that looks like in practice — and what's still missing before it becomes a paper.

Four stages of the bootstrap

ape_cli evolved from a mental model into a machine-verifiable contract. The versions map to a discrete progression — each stage earned the next by surviving real use.

Stage 1 pre-v0.0.1

Implicit APE

The author directed a default AI coding agent stage-by-stage, manually enforcing the Analyze → Plan → Execute cycle through conversational discipline. No tooling existed. The methodology lived entirely in a human's mental model.

Evidence: early commit history, unstructured conversations.
Stage 2 v0.0.1 — v0.0.5

Prompt as methodology

The mental model was codified into a prompt. ape.agent.md formalized states, transitions, and sub-agent roles. The prompt became the transition function — executable, if imperfectly.

Evidence: first versions of ape.agent.md, commit diffs of prompt evolution.
Stage 3 v0.0.6 — v0.0.10

Custom agent

Deploy infrastructure (ape target get) stabilized as a single-target Copilot deployment. The cycle became self-enforcing — the agent refused to skip states, demanded issue numbers, required user gates. The system began constraining its own development.

Evidence: every version from v0.0.7+ has docs/issues/NNN-slug/ artifacts.
Stage 4 v0.0.11 — v0.0.14

CLI + contract

Runtime infrastructure arrived: FSM transition contract (YAML), programmatic transitions with precondition validation (ape state transition), declarative effects, and evolution infrastructure (.ape/config.yaml, .ape/mutations.md). The contract says what's legal; tests prove the contract holds.

Evidence: transition_contract.yaml, 131 passing tests, 12 GitHub releases, 69+ issues and PRs.

Evidence today

Numbers after two months. Everything is in git history and the GitHub API — auditable by anyone who wants to check.

versions shipped
12 GitHub releases

131

tests passing
cross-platform

CLI commands
across 3 modules

active agents
down from 9 in lore

69+

issues & PRs
every change through the cycle

v0.0.7+

with cycle artifacts
docs/issues/NNN-slug/

What DARWIN has produced

When evolution.enabled is true, DARWIN reads the cycle's artifacts and files concrete mutation proposals as GitHub issues. The collapse from nine lore apes to four active ones was driven by these proposals — every deprecation is logged, with reasoning.

Example. Issue #54 proposes a change to how EXECUTE interacts with the test runner, citing three cycles where the same inefficiency appeared. The proposal is public, debatable, and subject to the same review process as any other issue. Nothing is filed silently.

The mutations aren't theoretical. The four active apes exist as they do because DARWIN proposed absorbing MARCOPOLO into SOCRATES, replacing SUNZI with DESCARTES's method, and deprecating ADA's TDD phase in favor of BASHŌ's techne — and the maintainer accepted those proposals based on cycle evidence.

What's still missing

Honest accounting. The bootstrap is empirical but incomplete. Three gaps block the paper.

1. Structured per-cycle metrics: The cycle produces artifacts, but there's no machine-readable metrics.yaml capturing time-to-plan, plan completion rate, test pass delta, or reviewer overrides. Roadmap item #72.
2. Thirty clean cycles of data: Current reproducibility score: 2/10. Early cycles ran before the contract stabilized; only post-v0.0.11 cycles qualify as clean. Thirty is a minimum for statistical claims.
3. A test matrix across targets: Single-target MVP today (Copilot). Adapters exist for Claude Code, Codex, Crush, and Gemini per ADR D20, but aren't wired in. The methodology > model thesis demands comparison across targets — ideally including a local 7B — to be testable at all.

Read the full plan

The complete research plan — thesis, data sources, metrics to graph, statistical methodology — lives in the repository. It's a working document; the shape is stable, the numbers are still accumulating.

→ docs/research/ape_builds_ape/bootstrap-validation.md

APE builds APE.

Four stages of the bootstrap

Implicit APE

Prompt as methodology

Custom agent

CLI + contract

Evidence today

What DARWIN has produced

What's still missing

Read the full plan

Go deeper

Methodology →

Meet the APEs →

Evolution →