Blog

5 Questions to Ask Before You Sign an AI Implementation Contract

A practical guide for executives evaluating AI consulting and delivery partners.

6 min read

If you're about to sign a contract with an AI consulting firm, an implementation partner, or a forward-deployed engineering team — congratulations on getting past the strategy phase. Most organizations don't.

The harder part is what comes next. The structure of the engagement you sign determines whether AI delivers durable value to your business or becomes a line item your CFO writes off in the next budget cycle.

We've watched the same five questions separate engagements that work from engagements that produce expensive deliverables and dependent teams. Each maps to a real failure pattern we see across customers. Each one is the kind of question a partner who knows what they're doing will welcome — and the kind of question a partner who doesn't will deflect.

Use these in your evaluation conversations. The right partner will have specific, prepared answers to all five. The wrong partner will have generic ones.


Question 01

"How will you optimize token economics from day one — and what's the cost trajectory as we scale?"

Why it matters: AI inference costs scale with usage. The first prototype your partner ships almost always misses basic cost-engineering practices like prompt caching, batch processing, and right-sized model selection per task. The result is monthly bills that grow nonlinearly with adoption — at exactly the moment you're trying to demonstrate ROI to your CFO.

Bad answers sound like
  • "We'll optimize after launch."
  • "It's hard to predict — depends on usage."
  • "That's an engineering detail, not a strategy concern."
Good answers sound like
  • "We'll structure the workflow with prompt caching from the first prototype, separating stable context from variable inputs. Anthropic's published documentation shows up to 90% cost reduction on cached content. We'll measure your specific savings in week two and adjust."
  • "Model selection is task-by-task, not vendor-by-vendor. Some queries don't need a frontier model. We'll architect for that flexibility from day one."
  • "Here's a sample cost model based on your projected volumes, with three optimization checkpoints in the first 90 days."

The good partner shows up with numbers you can defend to your CFO, not a hand-wave about "optimizing later."

Question 02

"What engineering discipline will you apply to the AI work — and how will it compare to what we already require of our software teams?"

Why it matters: Most enterprises that ship reliable software have non-negotiable practices: code review, automated testing, architectural decision records, traceability. Then they hire an AI consultancy that skips all of it because "it's just prompts." Six months in, your team has three workflows nobody trusts, four prompts that have drifted from the original behavior, and zero ability to reconstruct what changed.

Bad answers sound like
  • "Our process is more agile than that — we iterate quickly."
  • "AI development is different from software development."
  • "We'll document things as we go."
Good answers sound like
  • "Every prompt is versioned. Every architectural choice is documented as an ADR. Every output that drives a business decision passes through a cross-review protocol before it's trusted. The audit trail looks identical to the audit trail you require of your software teams — because the discipline that ships reliable software also ships reliable AI."
  • "We use Documentation-as-Code. If your auditor walks in tomorrow and asks how a specific decision got made, we can show them the prompt version, the model used, the verification step, and the human approval — for every output."

The good partner treats AI work as engineering work, not creative work.

Question 03

"How will you handle model selection — and how easily can we switch models if a better option emerges?"

Why it matters: A team that picks one model in week one and never revisits the choice ends up paying premium pricing for tasks a smaller model would handle, with zero architecture for switching when the landscape changes. AI capability moves fast. Your contract should not lock you into a single vendor's product roadmap.

Bad answers sound like
  • "We're an Anthropic shop" (or OpenAI, or any single-vendor identity).
  • "Once you're on a model, switching is too expensive to justify."
  • "The differences between models aren't significant enough to architect for."
Good answers sound like
  • "Model selection is an engineering decision per task, not a vendor relationship per organization. We use frontier models for enterprise-grade reasoning where it's warranted; smaller models from any provider where they're sufficient. The architecture is built so the choice is reversible. Cost structure benefits accordingly."
  • "Here's how the abstraction layer works. If a better model emerges next quarter, swapping it in is hours of work, not weeks."

The good partner gives you optionality, not lock-in.

Question 04

"How will user-built AI workflows be governed — and what's the path from a user prototype to a production-grade system?"

Why it matters: This is the failure pattern most executives don't recognize until it's everywhere. Business users get tool access — Copilot, ChatGPT Enterprise, internal RAG bots, agent platforms — and start building workflows. The org celebrates "democratizing AI." What's actually happening: dozens of half-built workflows with no governance, no shared prompt library, no observability, no audit trail. The IT team becomes a complaint resolution function. The CFO has no portfolio view. Compliance has no idea what's running.

The org thought it was democratizing AI. It actually decentralized failure.

Bad answers sound like
  • "Self-service is the future. We don't want to slow users down with governance."
  • "That's a separate workstream — we focus on the strategic deployments."
  • "Each business unit can govern their own usage."
Good answers sound like
  • "Self-service inside a wired environment is different from self-service in chaos. We'll build a shared prompt library, governed agent patterns, central observability, and a clear path from a user-built prototype to a production-grade system, with engineering discipline applied at the moment of promotion."
  • "Same democratization. Without the chaos."

The good partner builds the wiring that makes self-service safe, not just the tools that make it possible.

Question 05

"How will use cases be prioritized — and what does measurable success look like for each?"

Why it matters: Pilots get chosen because someone got excited at a conference. Or because a board member asked about it. Or because a vendor demoed something compelling. The math behind whether the use case will actually return value gets done after the pilot has already started, if it gets done at all. The result is predictable: a portfolio of "innovative" projects, none of which ships measurable business outcome. The CFO loses patience. The AI program loses funding.

Bad answers sound like
  • "We'll start with a quick win to build momentum."
  • "AI value compounds — let the pilots run and we'll measure later."
  • "Innovation is hard to measure in traditional ROI terms."
Good answers sound like
  • "Use-case prioritization is an engineering input, not a marketing exercise. Each candidate scoped against measurable business outcomes before a single token gets spent. Each pilot's success criteria written down before the work begins, in language your CFO can audit."
  • "Here's the scoring framework we use to rank candidate use cases. Here's a sample of how we'd apply it to your top five."

The good partner makes ROI grounding the entry condition, not the post-mortem.


What these five questions have in common

Each one is a discipline question. None of them are about AI capability. None of them are about which model is best. They are about whether the partner you're hiring has the engineering rigor, governance discipline, and business alignment to translate AI capability into operational outcomes that your team can sustain after the engagement ends.

That's the difference between a partner who delivers a deliverable and a partner who transfers a capability.

It is also the difference between an AI program that survives its first budget cycle and one that does not.

Bill Tennant is Founder & Principal of Northbeam Solutions. He holds three U.S. Provisional Patents and has published on AI-driven algorithm discovery, orchestrated cognition architectures, and enterprise AI trust frameworks.