APE builds APE.

After two months of using APE to build APE, the lore collapsed, the agents sharpened, and the framework improved itself one cycle at a time. This is what that looks like in practice — and what's still missing before it becomes a paper.

Four stages of the bootstrap

ape_cli evolved from a mental model into a machine-verifiable contract. The versions map to a discrete progression — each stage earned the next by surviving real use.

  1. Stage 1 pre-v0.0.1

    Implicit APE

    The author directed a default AI coding agent stage-by-stage, manually enforcing the Analyze → Plan → Execute cycle through conversational discipline. No tooling existed. The methodology lived entirely in a human's mental model.

    Evidence: early commit history, unstructured conversations.
  2. Stage 2 v0.0.1 — v0.0.5

    Prompt as methodology

    The mental model was codified into a prompt. ape.agent.md formalized states, transitions, and sub-agent roles. The prompt became the transition function — executable, if imperfectly.

    Evidence: first versions of ape.agent.md, commit diffs of prompt evolution.
  3. Stage 3 v0.0.6 — v0.0.10

    Custom agent

    Deploy infrastructure (ape target get) stabilized as a single-target Copilot deployment. The cycle became self-enforcing — the agent refused to skip states, demanded issue numbers, required user gates. The system began constraining its own development.

    Evidence: every version from v0.0.7+ has docs/issues/NNN-slug/ artifacts.
  4. Stage 4 v0.0.11 — v0.0.14

    CLI + contract

    Runtime infrastructure arrived: FSM transition contract (YAML), programmatic transitions with precondition validation (ape state transition), declarative effects, and evolution infrastructure (.ape/config.yaml, .ape/mutations.md). The contract says what's legal; tests prove the contract holds.

    Evidence: transition_contract.yaml, 131 passing tests, 12 GitHub releases, 69+ issues and PRs.

Evidence today

Numbers after two months. Everything is in git history and the GitHub API — auditable by anyone who wants to check.

14
versions shipped
12 GitHub releases
131
tests passing
cross-platform
9
CLI commands
across 3 modules
4
active agents
down from 9 in lore
69+
issues & PRs
every change through the cycle
v0.0.7+
with cycle artifacts
docs/issues/NNN-slug/

What DARWIN has produced

When evolution.enabled is true, DARWIN reads the cycle's artifacts and files concrete mutation proposals as GitHub issues. The collapse from nine lore apes to four active ones was driven by these proposals — every deprecation is logged, with reasoning.

Example. Issue #54 proposes a change to how EXECUTE interacts with the test runner, citing three cycles where the same inefficiency appeared. The proposal is public, debatable, and subject to the same review process as any other issue. Nothing is filed silently.

The mutations aren't theoretical. The four active apes exist as they do because DARWIN proposed absorbing MARCOPOLO into SOCRATES, replacing SUNZI with DESCARTES's method, and deprecating ADA's TDD phase in favor of BASHŌ's techne — and the maintainer accepted those proposals based on cycle evidence.

What's still missing

Honest accounting. The bootstrap is empirical but incomplete. Three gaps block the paper.

1. Structured per-cycle metrics
The cycle produces artifacts, but there's no machine-readable metrics.yaml capturing time-to-plan, plan completion rate, test pass delta, or reviewer overrides. Roadmap item #72.
2. Thirty clean cycles of data
Current reproducibility score: 2/10. Early cycles ran before the contract stabilized; only post-v0.0.11 cycles qualify as clean. Thirty is a minimum for statistical claims.
3. A test matrix across targets
Single-target MVP today (Copilot). Adapters exist for Claude Code, Codex, Crush, and Gemini per ADR D20, but aren't wired in. The methodology > model thesis demands comparison across targets — ideally including a local 7B — to be testable at all.

Read the full plan

The complete research plan — thesis, data sources, metrics to graph, statistical methodology — lives in the repository. It's a working document; the shape is stable, the numbers are still accumulating.

docs/research/ape_builds_ape/bootstrap-validation.md

Go deeper