AI Strategy Was the Easy Part.
What the labs spent $5.5B on this week. And what's still missing in the picture.
Sources cited (linked inline below): Anthropic + Blackstone + Goldman + H&F venture announcement · OpenAI's Development Company / TPG / Bain / Brookfield / Advent · Project Glasswing · Claude Mythos Preview · Anthropic prompt caching docs
$5.5 billion committed in 96 hours toward a single, very specific thesis: the bottleneck in enterprise AI is not the technology. It is everything that has to happen between the model and the business.
Marc Nachmann, Goldman's Global Head of Asset and Wealth Management, said something this week that most enterprise AI buyers already knew, but few had heard articulated this plainly:
"There's a big shortage of people who know how to apply these tools into businesses and then transform them."
He said it on the announcement of Anthropic's new enterprise AI services venture — Goldman, Blackstone, and Hellman & Friedman backing Anthropic with $1.5 billion to build a firm whose entire job is to put AI engineers and specialists inside enterprises and translate AI capability into operational outcomes.
Three days earlier, OpenAI announced its own enterprise services play — $4 billion raised at a $10 billion valuation, partnered with TPG, Bain Capital, Brookfield, and Advent. Same model. Same diagnosis.
This is the most unambiguous market signal we have gotten about where AI value actually unlocks. And it is also the most unambiguous admission that strategy decks were never going to be enough.
What these investments really tell us
The headlines on those announcements were, predictably, about competition. "The labs are taking on the consultants." Fine. That is the easy story.
The harder story is what those venture structures actually admit.
When the labs themselves — the people closest to the technology — choose to put billions of dollars behind operationalizing AI, they are saying something quietly devastating: the model will not get us across the goal line. There is still a tremendous need to approach these initiatives with disciplined work, to get the model to function reliably inside an organization that was not built for it.
If the model alone were enough, those ventures would not exist.
Goldman's investment thesis is not "AI tools need to be sold harder." It is "AI tools, even when correctly procured, do not deploy themselves into operating value." The capital is going toward closing the gap between capability potential and capability realization. That gap has a name in our work — we call it capability transfer — and it has been the founding focus of Northbeam Solutions since I started the firm earlier this year.
The reason strategy got easier is that the models themselves now do most of it. Any executive with a frontier model can produce a credible AI strategy memo in an afternoon. Five years ago that same work took a quarter and cost six figures. The same intelligence that needs disciplined deployment to deliver value has already collapsed the cost and time of strategic analysis itself.
What the models cannot currently collapse is the work after the memo.
Wiring the engineering rigor. Aligning the business processes. Bringing the operators, analysts, and executives inside the organization along — the people who have to live with the workflows, verify the outputs, and defend the governance.
That is the work AI made harder, not easier. The gap between what the models can do and what the organization can absorb just widened.
So this week was, for us, less an inflection point than an accelerant.
Why I started Northbeam
I spent the past 15 years in enterprise software and services — deep in AI, data, and analytics workloads — watching a pattern repeat itself across hundreds of customers in every vertical I served.
It always looked the same. A strategy lands. A pilot runs. The pilot looks impressive in the boardroom. And then, six to twelve months later, the team that was supposed to operate the AI is no closer to running it than the day before the consultants showed up.
The deliverable was real. The capability was not.
That gap — between the deliverable and the capability that lasts — is what I built Northbeam to close.
We do not sell products. We do not sell pilots. We embed alongside business and technical teams, build production-grade work with their people, and leave behind the engineering rigor, use-case discipline, ROI grounding, and governance that separates AI operational efficiency from AI experimentation.
That work has a name. It is the operating system we have been building since the start of the year. And it is what went public this week.
What goes wrong on the first pass
If the lab consulting plays are right that the bottleneck sits in delivery, the next question is the right one to ask: delivery of what, exactly? What do most enterprises actually get wrong on the first attempt at putting AI into operation?
The answer is consistent enough across customers that it has become a checklist. Five failure patterns. None of them are exotic. Most C-level readers will recognize at least three of them inside their own organization once they are named.
Failure 1. Token economics get ignored on the first build.
You will know this one is happening on your team if AI projects come back in monthly review with the comment "the inference costs are higher than we modeled." That is rarely a model problem. It is almost always a wiring problem.
The most common version: the first prototype ships without basic best practices, such as prompt caching — the technique of storing the unchanging part of an AI request (system instructions, reference materials, repeated context) so the system does not pay to re-process the same content on every call. Anthropic's published documentation shows up to a 90% cost reduction on cached content and similar latency improvements.
On a recent internal Northbeam prototype where we ingest large volumes of regulatory documentation across many calls, our first pass missed prompt caching entirely. Once we implemented it correctly — choosing the cache breakpoints, separating stable context from variable inputs — measured savings on aggregate input cost landed between 50–80%, with latency improvements of similar magnitude.
In plain numbers: if your AI workload is running $40,000 a month at full input cost, the same workload after correct caching lands somewhere between $8,000 and $20,000. That is not a tuning detail. That is the difference between a project the CFO approves to scale and a project the CFO kills at the next budget review.
Failure 2. The engineering discipline that ships software stops at the AI boundary.
Most enterprises that ship reliable software do so with code review, automated testing, architectural decision records, and traceability built in. Then a different team — or sometimes the same team in a different mode — picks up an AI workflow, and all of it stops.
Prompts get edited live in production with no review. Models hallucinate, and no one notices because there is no test suite. Decisions land on AI outputs that no second pair of eyes ever verified. Six months in, the team has three workflows nobody trusts and four prompts that have drifted so far from the original behavior that nobody can reconstruct what changed.
Documentation-as-Code and subagent-based SDLC with cross-review protocols — every prompt versioned, every architectural choice documented, every output verified — are not exotic engineering practices. They are the same disciplines you already require of your software teams, applied to the AI portfolio. The work that gets done on the AI side should look, in process and in audit trail, indistinguishable from the work that gets done on the software side.
Failure 3. Model selection becomes model and vendor lock-in by default.
A team picks one model in week one. They never revisit the choice. Six months later they are paying premium pricing for tasks a smaller model would handle perfectly, and they have zero architecture for switching when a better option emerges.
Model selection is an engineering decision per task, not a vendor relationship per organization. We use Anthropic for enterprise-grade reasoning where the work warrants it; we use smaller and faster models — including from other providers — where they are sufficient. The architecture is built so the choice is reversible. The cost structure benefits accordingly. So does the negotiating posture.
Failure 4. Self-service AI without the wiring.
This is the failure pattern most C-level readers do not recognize as a failure pattern. It looks, on the surface, like progress.
Business users get tool access — Copilot, ChatGPT Enterprise, an internal RAG bot, an agent platform. They start building workflows. The org celebrates the democratization of AI.
What is actually happening underneath: dozens of half-built workflows with no governance, no shared skill or prompt library, no observability, no audit trail. Use cases pile up. Nobody can tell which ones are returning value and which ones are creating risk. The IT team becomes a complaint resolution function. The CFO has no portfolio view. Security and compliance have no idea what is running.
The org thought it was democratizing AI. It actually decentralized failure.
Self-service inside a wired environment is different. Shared skills, assets, and prompt library. Governed agent patterns. Central observability. A clear path from a user-built prototype to a production-grade system, with the engineering discipline applied at the moment of promotion. Same democratization. Without the chaos.
Failure 5. Use cases get picked by enthusiasm, not ROI.
Pilots get chosen because someone got excited at a conference. Or because a board member asked about it. Or because a vendor demoed something compelling. The math behind whether the use case will actually return value gets done after the pilot has already started — if it gets done at all.
The result is predictable. A portfolio of "innovative" projects, none of which ships measurable business outcome. The CFO loses patience. The AI program loses funding. The team that lost six months building things nobody asked for loses morale. The organization loses the option to try again with a better selection process, because the appetite is gone.
Use-case prioritization is an engineering input, not a marketing exercise. Each candidate scoped against measurable business outcomes before a single token gets spent. Each pilot's success criteria written down before the work begins, in language the CFO can audit. The discipline is unglamorous. It is also the only thing that makes the AI program survive its first budget cycle intact.
This ROI alignment is made easier by AI — when the right rigor is applied.
The Mythos decision
To make the discipline question concrete, look at what Anthropic itself just did.
In April, Anthropic announced Project Glasswing — a cybersecurity initiative built around their most powerful model to date, Claude Mythos Preview. Mythos is, by Anthropic's own description, a frontier model whose coding and agentic capabilities exceed all but the most skilled human security researchers at finding and exploiting vulnerabilities. In its evaluation period, Mythos identified thousands of zero-day vulnerabilities across major operating systems and browsers — including a 17-year-old remote code execution flaw in FreeBSD (CVE-2026-4747) discovered fully autonomously.
Anthropic chose not to release Mythos publicly. They built Project Glasswing instead — a controlled program giving access only to a curated set of cybersecurity defenders and infrastructure operators, under appropriate guardrails.
They built the most capable model they have ever produced, and they decided the responsible action was to gate it.
Why does this matter to you, even if you are not in cybersecurity? Because the decision Anthropic made — the discipline of choosing what not to deploy — is the same discipline enterprise AI requires. Not at the model release level. At the deployment level. Inside your business. Every day. On every prompt, every workflow, every agent your team puts into production.
If Anthropic, with all of its resources and visibility, recognized that capability without discipline is a vector for harm, the lesson for every enterprise running AI tools is straightforward: capability without engineering discipline is the same vector at smaller scale, distributed across more surfaces, harder to see, and harder to clean up after.
What the lab plays get right — and what is still structurally limited
To be clear about the new lab consulting ventures: what they are doing is real, valuable, and well-capitalized. The model — forward-deployed engineers embedded inside enterprises — is the right model. They will deliver real value to many customers, and the broader market will be better for it.
The structural question is what those ventures will optimize for over time.
A consulting venture jointly funded by a frontier model lab and a private equity firm has, eventually, two economic obligations: maximize the deployment of the lab's model, and maximize the operating leverage of the consulting business. Both of those obligations create pressure — over many years, not all at once — toward outcomes that may not always be best for an individual customer's specific deployment.
A vendor-neutral, capability-transfer-focused firm answers a different set of questions.
- Which model fits this task best, regardless of who makes it?
- What does the customer's team need to be able to do without us twelve months from now?
- What documentation, governance, and discipline travels with the work, so that capability stays inside the organization when the engagement ends?
That is the position Northbeam holds. The two models are complementary, not competitive. Some customers will be best served by the new lab ventures. Others will be best served by what we do. Many will need both. The market needs both, and the diagnosis both models are answering is the same.
The next 12–18 months
A few things become predictable from where we sit now.
The capacity to do this work — internally or externally — with the right engineering and business alignment, will be the binding constraint on enterprise AI through the next budget cycle. The labs putting $5.5 billion against this gap is the market saying it loudly. Other capital will follow. The supply of people who can actually do this work will not catch up to the demand any time soon.
Audit and traceability — the documentation-as-code, the architectural decision records, the prompt versioning, the reusable skills, the cost-per-decision attribution — will move from optional engineering hygiene to procurement requirements, especially in regulated industries. Boards are starting to ask their auditors about AI governance. Auditors are not yet ready to answer. That gap closes within the next budget cycle, in our view, or the next one after.
And capability transfer will become a procurement criterion in its own right. The question buyers ask will not stop at "can you deliver this?" It will move to "can you deliver this in a way that shows quantifiable value and leaves my team able to run it?"
That is the question Northbeam was built to answer.
Closing
The bottleneck is not AI. The bottleneck is what happens between the model and the business.
The labs just bet $5.5 billion on that diagnosis.
If your team is past the AI strategy phase and needs support in the gap between strategy and outcome, we should talk.