Tag: Strategy

  • MIT Called It a Disenchanted Intern. METR Says Check the Growth Rate.

    MIT Called It a Disenchanted Intern. METR Says Check the Growth Rate.

    Something happened this week that I keep turning over.

    MIT published findings this month showing that when 41 AI models were tested across more than 11,000 real workplace tasks, the result was, in their words, like a “disenchanted intern” — hitting minimum benchmarks about 65% of the time, but never exceeding 50% success on tasks requiring genuinely superior-quality output. If you work in software, marketing, legal services, or knowledge work of any kind, that’s the snapshot.

    METR — a nonprofit focused on measuring AI capabilities — published a different kind of snapshot. Their metric is the “time horizon”: the maximum length of autonomous task a frontier AI can reliably complete. In 2019, the best AI could handle roughly a two-minute task without human intervention. By the end of 2025, that had grown to roughly an hour. The doubling time across that whole period: around seven months.

    METR’s January 2026 update tightened that number further. Post-2023, the best estimate for the doubling period is now 130 days — closer to four months.

    My read on this:

    The MIT study and the METR data aren’t in conflict. They’re measuring different things at different timescales. MIT is taking a photograph. METR is measuring the shutter speed. And the shutter speed is getting faster.

    I don’t think the “disenchanted intern” framing is wrong — it describes today accurately. What I’m less sure about is the assumption, implicit in most of the coverage I’ve read this week, that “today” is a stable state. An intern who gets twice as capable every four months is not the same resource at the end of the year as they are today.

    What I keep returning to is the gap between the current snapshot and the trajectory — and the opportunity that opens up in that gap. The MIT data is a photograph of now. The METR data is the shutter speed. Anyone building workflows, designing teams, or structuring how they work around AI capability today is working from a reference point that will be measurably out of date within a single planning cycle. That’s an opportunity signal at a scale and pace most planning assumptions don’t account for.

    Three things I’m watching:

    1. Where the doubling curve hits friction. Every exponential eventually meets a wall — physical limits, data constraints, regulatory friction. METR’s time-horizon metric is useful precisely because it measures real-world task completion, not synthetic benchmark scores. When the doubling cadence breaks, that will be the signal that the curve has met something real. I expect that to happen. I just don’t know when.

    2. Whether “minimally sufficient” matters or not. MIT’s 65% minimally sufficient rate sounds modest. But most enterprise workflows run on people who are minimally sufficient most of the time. The threshold isn’t excellence — it’s “acceptable at scale, around the clock, at near-zero marginal cost.” That bar is lower than it sounds, and closer than the headline number implies.

    3. The infrastructure spend as an access unlock. Alphabet, Meta, Microsoft, and Amazon are projected to spend nearly $700 billion combined on AI infrastructure in 2026 — roughly double what they spent last year. That capital isn’t just building capacity for the current snapshot. It’s funding the cost compression that makes the next several capability doublings broadly accessible. When the infrastructure matures, the cost floor drops — and the surface area for building on top of it expands with it.

    The disenchanted intern framing is apt today. My expectation is that it’s a better description of 2025 than it is of 2027.

    References

  • Inference cost has collapsed. Enterprise AI business cases haven’t caught up.

    Inference cost has collapsed. Enterprise AI business cases haven’t caught up.

    GPT-4 class inference cost $20 per million tokens at launch in early 2023. In April 2026, equivalent performance runs $0.40. Most enterprise AI business cases were built somewhere in the middle — and haven’t been updated since.

    That gap is not a technology story. It is an arithmetic problem wearing a strategy hat.

    What moved

    Inference costs have declined faster than the bandwidth price collapse of the early internet era, faster than PC compute, and considerably faster than any enterprise finance model anticipated. Artificial Analysis tracks it live: the cheapest capable models today run under $0.50 per million tokens. A flagship model that cost $10 per million tokens eighteen months ago now costs $2–3. The price range between the cheapest and most expensive capable options has widened past a thousand-to-one.

    The driver is compounding. Better training efficiency produced more capable models at lower operating cost. Competition between providers accelerated the pass-through. Specialised chips entered the stack. The result: a cost curve that looks less like traditional software pricing and more like solar panel economics — each year’s curve is below where last year’s curve said it would be.

    What did not move

    Enterprise AI business cases.

    S&P Global found that 42% of companies abandoned most of their AI projects in 2025. Cost and unclear value were the top reasons cited. IBM put the share of AI initiatives delivering expected ROI at 25%. MIT found that 95% of AI pilots delivered zero measurable P&L impact (MIT NANDA, State of AI in Business, 2025).

    These numbers are real. But the interpretation of why projects fail is often imprecise.

    Projects approved in 2023 and 2024 were scoped against the pricing environment of 2023 and 2024. The cost models that informed the go/no-go decisions used token prices that no longer exist. The ROI denominators were anchored to infrastructure assumptions from a period when GPT-4 access cost $10–20 per million tokens. The business cases that were rejected on cost grounds — the ones that landed below the internal ROI hurdle by a thin margin — were rejected against a cost basis that is now a fraction of what it was.

    That is not a technology failure. It is a modeling lag.

    Andreas’s view

    My read on this: there are two different things getting conflated in the ROI conversation. One is genuinely poor outcomes — wrong use case, shallow integration, insufficient change management. That is real and deserves scrutiny. The other is a systematic understatement of AI’s economic potential because the cost assumptions in the business case never got refreshed. Those two phenomena look identical in the data.

    I don’t think the 42% abandonment rate or the 25% ROI hit rate tells us much about what AI can do at today’s prices. It tells us how enterprises perform against business cases built on 2023 assumptions. The projects that got killed for cost reasons in Q4 2024 would look different rerun against Q2 2026 pricing.

    My expectation is that the organisations getting ahead of this are running a specific exercise that most are not: taking the cost assumptions out of every AI initiative that was rejected or stalled in 2023–2025, replacing them with current market rates, and seeing which cases cross the ROI threshold now. Not all of them will. But some will — and the decision to revisit them is a spreadsheet exercise, not a technology project.

    Three things I’m watching:

    • Whether finance teams are treating inference cost as a stable input or a variable. Most enterprise budget models treat infrastructure cost as a constant. Inference cost is not a constant — it has been declining faster than almost any other enterprise input cost in the last three years.
    • The spread between unit cost and total spend. Per-token costs have collapsed, but total enterprise AI spend is forecast to jump 65% in 2026 — from roughly $7M average to over $11M (IDC). Volume is expanding faster than unit costs are falling. The budget impact of AI is still growing, even as the underlying unit economics are dramatically more favourable than they were.
    • How capital allocation committees handle the remodel request. The institutional question: if a CFO approved a 2023 AI business case that underperformed, how does the organisation handle finance coming back and saying “the cost structure changed — the case should have worked, we just used the wrong numbers”? That conversation is coming.

    What this reveals

    The collapse in inference cost is well-understood in developer circles. Engineers who run inference workloads reset their unit economics continuously — it is operational reality. The delay is in the enterprise business case layer, where cost assumptions travel up through approval chains, get embedded in multi-year plans, and calcify.

    The cost curve does not care about the approval cycle. It moved while the slide decks were in review.

    This is not an argument that all AI investments look better at current pricing — some of those failed pilots would have failed regardless, and the organisational conditions for AI success (clear scope, embedded workflows, meaningful accountability) have not gotten easier. But a non-trivial fraction of the projects that stalled on cost now live in territory where the math is different. Identifying them is a shorter path to AI ROI than starting new initiatives from scratch.

  • Model deprecation is the new continuity risk

    Model deprecation is the new continuity risk

    Four rectangles in a row with the leftmost ghosted, simple connecting arrows
    A — model lifecycle row.

    OpenAI announced the discontinuation of the Sora web and app experiences on April 26, with the Sora API following on September 24. The first deprecation triggers in two weeks. Enterprises that built workflows on Sora since launch are not facing a model upgrade — they are facing a workflow rebuild on a four-month timeline. This is the first prominent enterprise-facing AI deprecation event of the cycle, and the precedent it sets matters more than the specific product involved.

    Model deprecation is no longer a developer-tier concern. It is an enterprise governance question that deserves a place on the risk committee agenda. The real shift is happening here: AI dependency without continuity is becoming a board-level risk in 2026.

    The shift: dependency without continuity guarantees

    The pattern of the past two years has been to build agent workflows on whichever foundation model was demonstrably best at the time, with little contractual commitment from the model provider about how long that model would remain available. Provider terms have improved — Azure OpenAI’s twelve-plus-six-month commitment for generally available models is the strongest standard in market — but most enterprises have not negotiated equivalent terms with their chosen providers. They built on capability, not on continuity.

    When the provider sunsets the model, the enterprise’s options are bad. Migrate to a successor model that may behave differently in subtle ways — requiring re-validation of every governed use case. Renegotiate at the eleventh hour for extended access at unfavorable terms. Or absorb the operational disruption of the workflow simply not working until rebuilt.

    The Sora event is small in dollar terms but large in precedent. The next deprecation will involve a more enterprise-critical model, and the enterprises that did not see this one coming are not going to see that one coming either.

    A single thread connecting a workflow box to a model box, the thread visibly fraying near the model with a clock above
    Built on capability. Not on continuity.

    The role change is the addition of an AI continuity discipline

    Inside enterprises that take this seriously, a discipline is emerging that did not exist in 2024 — AI continuity management. The work overlaps with vendor management, with disaster recovery, with model risk management, and with regulatory compliance, but it is structurally distinct from all of them. The discipline involves maintaining an inventory of model dependencies by workflow, negotiating continuity commitments at procurement, running successor-model regression tests on a regular cadence, and ensuring that the documentation chain meets the rebuild-readiness standard.

    Most enterprises have not staffed this discipline. The accountabilities are scattered across teams that do not coordinate. The procurement team negotiated the model contract a year ago without a continuity clause. The deployment team is building production dependencies on the model without thinking about migration cost. The risk team has not flagged model deprecation as a category. When the deprecation announcement lands, the company finds out it has no plan.

    The fix is straightforward in concept and slow in practice. Add continuity commitments to the procurement template. Build a model-dependency inventory. Designate an owner for AI continuity at the executive level. Run quarterly successor-model tests. None of this is hard. It is just unglamorous work that does not get done unless someone owns it.

    The strategic consequence is renewed buy-versus-build math

    Continuity risk changes the calculus of where to deploy AI capability. For workflows where the cost of unplanned migration is high — regulated workflows, mission-critical operations, customer-facing experiences with high switching costs — the case for either fine-tuning a frontier model into a controlled deployment, partnering with a vendor offering enterprise-grade continuity commitments, or building on open-weight models the enterprise can host indefinitely is stronger than it was in 2024. The case for relying on whichever model is best on a benchmark this quarter is weaker.

    The math is not simple. Open-weight models lag the frontier, sometimes meaningfully. Self-hosting carries operational cost that the proprietary providers absorb. The vendor lock-in to a single proprietary provider, even with the best continuity terms, is a different kind of risk than open-weight self-hosting carries. Each enterprise has to make this trade-off based on the workflow’s tolerance for capability lag versus its tolerance for continuity disruption.

    What is no longer defensible in 2026 is treating model continuity as someone else’s problem. The Sora sunset is small. The next one will not be.

    So what boards should do this quarter

    Add model deprecation to the risk committee agenda. The first deprecation event lands in two weeks. The board should at minimum understand which workflows are exposed and what the migration plans are.

    Demand a model-dependency inventory. Which workflows depend on which models from which providers, with which contractual continuity commitments. If this inventory does not exist, building it is the priority.

    Reconsider the buy-versus-build posture for mission-critical AI workflows. The 2024 default — use whichever proprietary model is best — was rational at the time. In 2026, with the deprecation precedent now visible, that default deserves an explicit reconsideration. Continuity is becoming a form of resilience. The boards that price it in this quarter will not be the ones rebuilding workflows under deadline.

    References and links

  • When 88% of organizations have adopted AI, adoption stops being the question

    When 88% of organizations have adopted AI, adoption stops being the question

    What was announced

    The Stanford HAI 2026 AI Index landed in mid-January with a set of numbers that close out a debate. Organizational AI adoption reached 88% globally. Global corporate AI investment more than doubled in 2025 to $581.7 billion. Generative AI hit 53% population adoption within three years — faster than the personal computer or the internet. Four out of five university students now use generative AI as part of their coursework.

    What it means

    When adoption crosses the 80% line, the question of “should we adopt” becomes structurally uninteresting. Every relevant comparison group has already answered it. What remains is differentiation — and differentiation in a world of universal access is harder, not easier, than in a world of selective access. The strategic margin moves from access to integration depth, from licenses to workflow penetration, and from procurement decisions to operating-model decisions.

    The investment number is the more telling signal. $581.7 billion of corporate AI investment in a single year is a capital allocation that prices in a specific belief: that AI capability will compound at a rate that makes today’s spending the cheap option in retrospect. That belief either turns out to be correct, in which case the laggards face a permanent gap, or it overshoots, in which case the survivors of the correction still own infrastructure and skills the laggards do not.

    Andreas’s view

    My read on this: the AI Index numbers are not a celebration of momentum, they are a notice of obsolescence. Adoption was the entry-level metric — the one that let companies say “we are doing AI” without committing to anything that mattered. With 88% adoption, that metric is exhausted. The companies that conflate “we have AI deployed” with “we have an AI strategy” will be the ones surprised in 18 months when peers with the same headline adoption rate are operating at a fundamentally different unit-economics base.

    I don’t think the next two years will be about adopting more. They will be about routing work differently — deciding which functions become AI-native, which roles get redesigned, which middle-management layers compress, and which workflows get rebuilt from the ground up rather than augmented. The companies treating this as a tooling question will keep the org chart they had in 2024 and bolt assistants onto it. The companies treating it as a structural question will redesign for AI-native operations and harvest a different cost base.

    My expectation is that boards still reporting on adoption rates are measuring the wrong thing entirely. The number that matters is the percentage of work routed through AI-native processes versus AI-augmented legacy processes. Those are two different cost structures and two different competitive positions. The first is a step change. The second is a feature.

    Three things I’m watching

    1. I’ll be watching whether companies move away from adoption KPIs toward integration-depth KPIs — specifically, the percentage of revenue-generating workflows that are AI-native, not just AI-touched.
    2. The companies that stand out to me will be the ones that build the comparison the AI Index doesn’t make for them: how their spend per FTE on AI infrastructure and tooling stacks up against the 90th-percentile peer in their sector. If that number isn’t visible to leadership, it isn’t informing strategy.
    3. I’ll be watching whether organizations use the next 12 months as a workflow-redesign window rather than a tooling-procurement window. The structural opportunity narrows the moment competitors finish their redesign.

    References and related signals

  • The agentic year begins underprepared

    The agentic year begins underprepared

    The year opens with a measurable gap. McKinsey’s 2026 trust maturity survey, fielded in December and January, puts twenty-three percent of organizations into the scaling phase for agentic systems and thirty-nine percent into experimentation. The remaining majority — nearly two thirds — has not yet begun scaling AI across the enterprise. The capability frontier moved twelve to eighteen months faster than the operating models around it. That gap is no longer an experimentation question. It is the year’s defining strategic risk.

    The boards that close this gap first will not be using better models than their competitors. They will be running organizations that can metabolize what the models already do. The constraint is no longer technology. It is adoption — and adoption is a leadership problem.

    The shift is structural, not cyclical

    Agentic systems are not a new feature inside a familiar product. They are a new class of worker. They take a goal, decompose it into steps, hold state across those steps, call other tools, recover from errors, and return a completed unit of work. That changes what a job is, not how a job is done.

    The 2025 narrative — copilots, productivity boosts, ten percent uplift — is over. The 2026 question is harder. What units of work no longer require a human originator? What units of work now require a human reviewer instead of a human executor? Which decisions can be delegated to a system that explains its reasoning? The companies asking these questions on a Monday morning are reorganizing. The companies still benchmarking model accuracy are stalling.

    The shift is one-way. No board will vote in 2027 to remove agentic systems from a workflow they reduced from forty hours to four. The architectural choices made this year will compound.

    Diagram of one human silhouette passing a goal to a central node that branches into multiple task arrows
    Goal in, decomposition out, no human in the loop between.

    The role change has already happened on the ground

    Inside organizations that have actually shipped agentic systems, the role redefinition is happening informally, by individual contributors, ahead of any HR process. A senior analyst who used to write three reports a week now reviews twelve agent-drafted reports a week and signs off on the analysis. A staff engineer who used to write three pull requests a day now reviews fifteen agent-generated pull requests a day. An account manager who used to draft proposals now edits proposals the agent has built from CRM context.

    The work that survives is judgment, taste, accountability, and relationship. The work that does not survive is execution under specification. Job titles still describe the second category. Job content has already shifted to the first.

    First-line managers feel this most acutely. They were trained to manage humans doing execution work. They are now managing humans doing review work, who in turn are managing systems doing execution work. That is a different management discipline — closer to portfolio management of automated processes than to people management of execution teams.

    A figure at a desk with twelve document icons floating above, marking one of them
    Three reports a week became twelve reviews a week.

    The organizational consequence is delayering

    Span of control widens when the work below each manager becomes more automated and more reviewable. McKinsey’s parallel work on the state of organizations points in the same direction: companies that scale agentic systems also flatten by removing one to two layers of middle management. The economic logic is direct. Middle layers existed to translate strategy into execution and to coordinate the humans doing that execution. When the execution is increasingly handled by systems and the translation is increasingly handled by models, the layer is doing less.

    This is not the 2024 layoff cycle that hit individual contributors. This is a 2026 reorganization that compresses the manager-of-managers layer. It is structurally different and politically harder. The people most threatened by it are the people running the budget meetings about it.

    Organizations that resist the delayering will have a temporary cost advantage and a permanent decision-velocity disadvantage. Decision cycles compress when fewer humans need to be in the loop. The competitor who removed two layers will commit to a market move three weeks faster. Over a year, that compounds into a different market position.

    Two org-chart pyramids side by side, the right one flatter, with an arrow indicating compression
    The middle layer compresses, span of control widens.

    So what boards should do this quarter

    Two actions belong on the Q1 agenda. First, demand a workforce plan that names the units of work moving from human execution to human review, with a twelve-month horizon. Vague AI strategies are no longer acceptable as deliverables; the question is which jobs, which tasks, which review cadences, which accountability lines.

    Second, name an executive owner for the operating-model redesign — not for AI strategy as a separate track, but for the way the company will be organized around the systems it has already deployed. The CHRO and the COO are the natural owners. The CTO is not. The technology decision is downstream of the operating-model decision, and treating it as upstream is how organizations end up with sophisticated tools and a 2023 org chart.

    The year that just started will be measured by the gap between capability and operating model. The companies that close it first set the pace for the rest of the decade. The risk is not moving too fast. The risk is moving too late. Execution speed will separate leaders from followers.