Methodology over model.

APE treats your AI coding assistant as a finite state machine with a declarative transition contract. The CLI enforces pre- and post-conditions. Intelligence emerges from orchestration and memory — not from any single agent's capability.

The five-state FSM

Six states if you count rest. IDLE is the entry and exit. ANALYZE, PLAN, EXECUTE are the core. END is the PR gate. EVOLUTION is opt-in. The forward flow is linear; backward edges from PLAN and EXECUTE to ANALYZE exist because re-analysis is legal, not a recovery hack.

APE finite state machine diagram showing IDLE → ANALYZE → PLAN → EXECUTE → END → IDLE cycle, with backward transitions from PLAN and EXECUTE to ANALYZE, and an opt-in EVOLUTION state after END.
IDLE
Rest state. No agent runs; the cycle waits for the next issue to arrive.
ANALYZE · SOCRATES
Clarifies the problem through dialog. Produces diagnosis.md.
PLAN · DESCARTES
Divide, order, verify, enumerate. Produces plan.md.
EXECUTE · BASHŌ
Minimal, beautiful implementation under the tests. Produces code and commits.
END · PR gate
Opens and merges the pull request via gh. No agent; the CLI orchestrates.
EVOLUTION · DARWIN opt-in
Reads the cycle's artifacts and proposes mutations to APE itself. Off by default.

The transition contract

Transitions aren't code — they're data. A YAML file declares every legal transition, its trigger event, its prechecks, and its effects. The CLI reads this file at runtime and refuses any transition the contract doesn't permit.

# code/cli/assets/transition_contract.yaml (excerpt)
transitions:
  - from: ANALYZE
    to: PLAN
    event: complete_analysis
    prechecks:
      - diagnosis_exists
      - issue_linked
    effects:
      - set_state: PLAN
      - invoke_agent: DESCARTES

  - from: PLAN
    to: ANALYZE
    event: start_analyze
    # backward transition: re-analysis is legal
    prechecks:
      - human_approval

This is the core artifact. It's what makes APE reproducible across models: every ape reads the same contract and can only produce the transitions it permits. If a future target (Claude, Codex, Gemini) processes a different prompt, the contract constrains the output shape identically.

Memory as code

Project memory lives in two places, both markdown, both in git. No vector DB. No cloud dependency. The repository is the database.

.ape/ — per-repo state

  • state.yaml — current FSM state and last transition
  • config.yaml — feature flags (evolution.enabled, target, etc.)
  • mutations.md — history of DARWIN's accepted mutations
  • diagnosis.md, plan.md — active cycle's artifacts

docs/ — project knowledge

  • ADRs, specs, runbooks — whatever the project needs long-term
  • Indexed by the agents via the memory-read skill
  • index.md per directory acts as a primary index
  • Frontmatter YAML acts as a schema (enforced by a future skill)

Database-inspired indexing without a database. Directories are tables. Files are rows. Frontmatter is the schema. index.md is the primary index. An agent with grep is more than an agent with a vector store for this scale of knowledge. Dual legibility: the same file an agent reads is the same file you review in GitHub. No translation layer.

The risk matrix

Spec-only today. The idea: ape state transition should gate on human approval only when the engineering risk warrants it. Low-risk mechanical transitions (scaffolding, formatting, test scaffolding) proceed silently. High-risk ones (schema changes, production deploys, external API mutations) demand explicit approval.

Roadmap item. Until the matrix is implemented, every state transition requires operator confirmation. The default is safe but verbose. See the roadmap under long-term items.

Why methodology beats model

If the framework carries the weight, the model gets lighter.

Three scenarios, same conclusion. Cloud models get expensive? APE works with local ones. Frontier models plateau? DARWIN becomes the only improvement mechanism left. Models keep improving? APE amplifies the gains without rewriting the runbook. The framework benefits from disorder — antifragile across the AI market.

The thesis APE exists to test: given an identical task, a smaller model running APE outperforms a frontier model freestyling. The evidence lives in a metrics.yaml dataset that doesn't fully exist yet — the reproducibility score sits at 2/10 until 30 clean cycles and a multi-target test matrix land. See APE builds APE →.

Go deeper