Aurora L03 — Build · 90-second cinematic

Build, Verified

An AI builder ships.
A separate verifier checks its work.

The methodology assumes the builder will make mistakes. It assigns a fresh-context verifier whose only job is to find them — before they ship.

The Build Loop

Nine phases. Per commit.

01

Charter

Verify hash. Or halt.

02

Decompose

Acceptance → tests

03

Red Team

Load forbidden patterns

04

Test First

Failing tests authored

05

Implement

Code until green

06

Self-Review

Builder audits itself

07

Integration Proof

Cite governing decisions

08

Verify

Fresh-context auditor

09

Honest Report

What worked. What didn't.

Wave 1 · Commit Sequence

Seven commits. All green.

C1

Model repair
scoring · backfill

→

C2

Canonical definition
hash-signed

→

C3

Deprecate shadows
4 paths sunset

→

C4

Driver attribution
interpretable AI

→

C5

Pipeline integrate
per-account drivers

→

C6

Playbook dispatch
human-in-loop

→

C7

Board memo
narrative · safe AI

73 / 97

Wave-1 tests passing · 0 failing
Of the 97 tests authored from the L02 charter, 73 cover Wave 1's 5 workstreams. The remaining 24 are committed to Waves 2 & 3.

Safe AI · Enforced in Code

The build agent's own first commit would have violated the charter.

import torch

⛔ ADR-008 · BLOCKED

L02's ADR-008 forbids opaque ML on any path that explains a customer-impact decision. The build agent's planning surface produced an import torch line for the driver-attribution model. The Phase-3 grep refused the commit. The agent retried with the allowed library set (rule-based + analytical attribution). Safety isn't a code-review aspiration — it's a structural refusal at commit time.

Human-in-the-Loop · Enforced

The AI prepares. A named human ratifies.

dispatcher.send_email(playbook="champion-replacement", account_id="ACCT-734")

⛔ ADR-016 · MISSING HumanConfirmation

dispatcher.send_email(playbook=..., account_id=..., human_confirmation=HumanConfirmation(by="devon", at=...))

✓ RATIFIED

In the Keystone pilot: 41 at-risk accounts identified. 22 actions captured a human confirmation before the playbook fired. Five were auto-suppressed by the validation gate before reaching any human. No external customer action fired without the build agent providing — and a person signing — the confirmation record.

Phase 8 · Independent Verification

The verifier caught what self-review missed.

Builder · Self-review

"All tests passing. Commit ready."

Phase 6 self-review confirmed the canonical-definition test suite passes. The runtime hash-match test for the customer health definition turned green. Ready to ship.

Verifier · Fresh context

"The on-disk file doesn't match the runtime."

The auditor with no memory of the build ran a separate check: load the YAML file from disk, recompute its hash, compare to what the runtime says. The hashes didn't match.

⚠ Bug surfaced: YAML carried a placeholder hash. Runtime self-match passed because it never loaded the on-disk file. In production this would have shipped silent corruption.

Fix: regenerate the YAML from the canonical builder; add an on-disk-vs-runtime test that fails if they ever drift again. Three bugs surfaced across the Wave 1 run. All three fixed before the completion report sealed.

Pilot Outcome · Week 4

From locked spec to live pilot: four weeks.

41

At-risk identified

Aurora canonical definition

36

Playbooks dispatched

5 auto-suppressed by gate

26

Saves confirmed

63.7% save rate

$2.94M

ARR retained

of $4.62M cohort exposure

Trajectory: on-track to the 2-point churn-reduction target by Q4. The board memo for the June 18 meeting was rendered by the C7 narrative pipeline — every paragraph traces to a numerical source, enforced by ADR-015.

Automated · Verified · Repeatable

Weeks of construction
now automated, verified, and repeatable.

A fresh-context verifier audits every commit. Refused patterns never reach production. Honest deferrals stay deferred. The same engine runs again for Wave 2 and Wave 3 under the same locked spec.

7 / 7

Commits verified

73

Tests passing · 0 failing

3

Bugs caught by verifier

15/15

Wave-1 AI decisions honored

Phase 9 · Honest Report

What got built. What got deferred. Both stated.

21

Acceptance criteria satisfied

with evidence captured

3

Deferred — real systems

live credentials pending

5

Surprises catalogued

verifier findings · fixes

"No success theater detected. No mocks laundered as integration evidence; no tests skipped; no TODO in executable paths." — Completion Report, Wave 1

Handoff

Working system. Plus the receipts.

L03 · Shipped

Working pilot + audit trail

→

L04 · PROVE

Aurora doesn't hand off "it works on our laptops." It hands off a system + a test suite + a verifier-signed audit trail + a forecast that L04 will hold to outcome.