After two months of using APE to build APE, the lore collapsed, the agents sharpened, and the framework improved itself one cycle at a time. This is what that looks like in practice — and what's still missing before it becomes a paper.
ape_cli evolved from a mental model into a machine-verifiable contract. The versions map to a discrete progression — each stage earned the next by surviving real use.
The author directed a default AI coding agent stage-by-stage, manually enforcing the Analyze → Plan → Execute cycle through conversational discipline. No tooling existed. The methodology lived entirely in a human's mental model.
The mental model was codified into a prompt. ape.agent.md formalized states, transitions, and sub-agent roles. The prompt became the transition function — executable, if imperfectly.
ape.agent.md, commit diffs of prompt evolution.Deploy infrastructure (ape target get) stabilized as a single-target Copilot deployment. The cycle became self-enforcing — the agent refused to skip states, demanded issue numbers, required user gates. The system began constraining its own development.
docs/issues/NNN-slug/ artifacts.Runtime infrastructure arrived: FSM transition contract (YAML), programmatic transitions with precondition validation (ape state transition), declarative effects, and evolution infrastructure (.ape/config.yaml, .ape/mutations.md). The contract says what's legal; tests prove the contract holds.
transition_contract.yaml, 131 passing tests, 12 GitHub releases, 69+ issues and PRs.Numbers after two months. Everything is in git history and the GitHub API — auditable by anyone who wants to check.
docs/issues/NNN-slug/
When evolution.enabled is true, DARWIN reads the cycle's artifacts and files concrete mutation proposals as GitHub issues. The collapse from nine lore apes to four active ones was driven by these proposals — every deprecation is logged, with reasoning.
The mutations aren't theoretical. The four active apes exist as they do because DARWIN proposed absorbing MARCOPOLO into SOCRATES, replacing SUNZI with DESCARTES's method, and deprecating ADA's TDD phase in favor of BASHŌ's techne — and the maintainer accepted those proposals based on cycle evidence.
Honest accounting. The bootstrap is empirical but incomplete. Three gaps block the paper.
metrics.yaml capturing time-to-plan, plan completion rate, test pass delta, or reviewer overrides. Roadmap item #72.
The complete research plan — thesis, data sources, metrics to graph, statistical methodology — lives in the repository. It's a working document; the shape is stable, the numbers are still accumulating.