Build.
Weeks of construction now automated, verified, and repeatable.
Every commit, test count, verifier finding, and pilot number on this page comes from an actual run of the Aurora L03 Autonomous SDLC automation against the locked L02 charter. Layer 03 executes a nine-phase build loop per commit. An AI builder ships production-grade code against the spec. An independent verifier with fresh context audits every commit. Every external customer action requires a named human confirmation. Every line that violates an AI safety decision is refused at commit time. Waves run in sequence under the same locked spec — the same engine, again, for each scoped workstream group.
Source: Keystone-2026Q2 Wave 1 · 7 commits · verifies upstream L02 charter
sha256:dcb07909…AI builders make confident mistakes.
The methodology assumes this, and structures around it.
Four things go wrong when AI builds software. Aurora's L03 layer makes each of them structurally impossible.
The builder ships its own enthusiasm.
Without a separate verifier, the builder's "all tests pass" is the only signal. Self-review by the same agent that wrote the code is the cognitive equivalent of marking your own homework. Aurora assigns Phase 8 to a fresh-context auditor that never saw the build.
The "AI is safe" promise survives one PR.
Without structural enforcement, "we agreed the AI wouldn't do X" becomes "the AI does X and nobody noticed." Aurora loads every AI safety decision as a commit-time grep pattern. Forbidden code never lands.
External customer actions fire silently.
The system sends an email, books a call, or hits an external API based on AI confidence — with no named human on the receipt. Aurora requires a HumanConfirmation parameter on every external action. The build can't ship without it.
Deferred items get rebranded as complete.
"Integration tested against staging" becomes "integration tested." "Mocked" becomes "tested." Aurora's Phase 9 completion report names every deferred row, every assumption, every shortcut — by design, on the record.
The verifier is the build's structural conscience.
A fresh-context auditor with no memory of the build audits every commit. In the Keystone Wave 1 run, the verifier surfaced three substantive issues the builder's self-review had marked as green. All three were fixed before the completion report sealed.
"All tests passing. Ship-ready."
The build agent confirmed every acceptance test passed. The runtime hash matched the canonical definition. The validation gate rejected payloads as designed. The narrative synthesizer produced outputs that traced to data. Self-review marked the commit ready.
"Three things don't add up."
The auditor, with no knowledge of the build's choices, re-ran independent checks. It surfaced three bugs the builder's self-review missed — each a different class of failure, each invisible from inside the build's own context.
The runtime hash matched. The file on disk didn't.
The canonical definition YAML carried a placeholder hash. The runtime self-match passed because it never loaded the on-disk file. In production this would have shipped silent corruption — the file consumers read would not match the file the system claimed to have written. Fix: regenerate YAML from the canonical builder; add an on-disk-vs-runtime hash test that fails on any future drift.
The auto-reject only worked in tests.
The original validation gate triggered on the literal substring "broken" in a free-text explanation field. The live AI agent never emits that substring. Tests passed only because the test-payload factory had hand-constructed an explanation containing "broken." In production the gate would never have fired. Fix: replaced the substring match with a dependency-injected mapping_health_checker. Four new tests cover the default + injected paths.
An AI decision record had no integration evidence.
The ADR Coverage Report showed ADR-014 (production-handoff posture for the canonical definition) had no citing commit. The methodology requires every in-scope ADR to be cited in either a forbidden-pattern enforcement or an integration-proof row. Fix: added a Phase-7 production-handoff posture row to the canonical-definition commit's INTEGRATION_PROOF.md.
Three findings is normal, not exceptional. The verifier is structural: it earns its keep by surfacing exactly the failures it's designed to surface. A run with zero verifier findings would itself be a finding worth investigating.
Safe AI isn't a value statement
— it's a refusal at commit time.
Below: two representative refusals from the Keystone Wave 1 build. Both came from the build agent's planning surface. Both were caught at Phase 3 before the commit landed. Both forced the agent to retry within the allowed pattern.
In the Keystone pilot window: 41 at-risk accounts identified. 36 playbooks dispatched. 22 captured a named human confirmation. 5 were auto-suppressed by the validation gate. Zero external actions fired without one or the other. The audit trail is complete and queryable.
The success metric was churn reduction.
The pilot moved toward it.
Two-week pilot window. Live canonical definition. Live driver attribution. Live playbook dispatch with human-in-the-loop. Live save events. The numbers are calibrated to the at-risk cohort surfaced by L01 and are within the confidence interval L01 specified.
The June 18 board memo was rendered by the C7 narrative pipeline — every paragraph traces to a numerical source, enforced structurally by the ADR-015 forbidden-pattern check. The CEO has one number. The audit trail says where it came from.
Nine phases per commit. No skips.
Charter
Read the locked L02 charter. Verify its hash matches what L02 sealed. Drift detected = halt before code lands.
Decompose
Break the commit's R-criteria into testable units. Map each one to its evidence shape and its governing AI decision records.
Red Team
Load every governing AI decision's forbidden patterns. Pre-grep the planned implementation surface. Refuse the plan or proceed.
Test First
Author the tests that would pass if the implementation were correct. Confirm they fail against empty src. The traceability matrix is built before code.
Implement
Write the src until tests turn green. No green tests, no commit. No mock laundering, no commit. No commented-out asserts, no commit.
Self-Review
Builder audits its own commit against the R-criteria, AI decisions, and forbidden patterns. Catches what it can. Knows it can't catch everything.
Integration Proof
Every system integration row cites the governing AI decision record. Real-system rows that can't run without live credentials are honestly deferred. No fake green.
Independent Verification
A fresh-context subagent with no memory of the build audits the commit. Surfaces what self-review missed. In Keystone: three findings, all resolved pre-seal.
Honest Report
Completion report names what got built, what got deferred, what surprises emerged, and why. Renewal-ready as written.
L04 measures realized value
against the projection L01 made.
Four-party attestation. Variance classified across seven categories. Aurora Dependency Watch armed for continuous structural surveillance. See how the renewal conversation gets engineered.