Blog

  • AI’s next bottleneck may not be intelligence. It may be Earth.

    AI’s next bottleneck may not be intelligence. It may be Earth.

    For the last two years, the AI debate has been mostly about intelligence.

    Which model is ahead? How fast are capabilities improving? Will agents replace tasks, jobs, or whole workflows? Can Europe regulate the technology fast enough?

    All valid questions.

    But the next constraint may be less abstract. It may be physical.

    Power. Grid capacity. Land. Cooling. Permits. Transmission lines. Water. Construction time. Capital allocation.

    The AI race is turning into a gigawatt race. And if the space-data-center discussion is any signal, the next frontier may not just be cloud regions. It may be orbit.

    My read: the executive conversation has to move from "Which AI model should we use?" to "What physical infrastructure does our AI strategy depend on?"

    The scale shift

    Chart showing typical data center power use from 5-10 MW to 100 MW and 1 GW
    The scale jump matters: 10 MW is a facility, 100 MW is industrial infrastructure, and 1 GW becomes a regional energy strategy.

    A modern hyperscale data center is not a large office building with servers. It is an industrial energy asset.

    The International Energy Agency says average data centers draw around 5-10 megawatts. Large hyperscale facilities increasingly require 100 megawatts or more. That number sounds technical, so translate it.

    One megawatt running continuously for a year equals 8.76 gigawatt-hours. A 100 MW data center therefore consumes 876 GWh per year, or 0.876 TWh. At 90% utilization, still roughly 0.8 TWh per year. The IEA compares this to the annual electricity demand of about 350,000 to 400,000 electric cars.

    A 1 GW AI campus is ten 100 MW hyperscale data centers. Running continuously, it consumes 8.76 TWh per year.

    For comparison, Germany's annual electricity consumption is roughly 500 TWh. The EU is around 2,700 TWh. The US is around 4,000 TWh. So one 1 GW AI campus would be small at continental scale – about 0.3% of EU electricity consumption or 0.2% of US consumption – but huge at local grid scale.

    That local point matters.

    Put a 1 GW load in the wrong county, with weak transmission and slow permitting, and it is not "0.2% of America." It is a grid emergency, a political fight, and a capital allocation problem.

    Now consider the language around terawatts. Elon Musk's recent "Terafab" discussion was about chip manufacturing, not a conventional data center, but the vocabulary matters. AI infrastructure ambition is moving from mega to giga to tera. A theoretical 1 TW compute or manufacturing footprint running continuously would consume 8,760 TWh per year. That is more electricity than the US and EU combined.

    That does not mean a 1 TW data center is around the corner. It means the ambition curve is now colliding with the energy system.

    The current footprint

    The IEA estimates global data center electricity consumption at 240-340 TWh in 2022, excluding crypto mining. That was around 1-1.3% of global final electricity demand.

    In large economies such as the United States, China and the European Union, data centers already account for around 2-4% of total electricity consumption. That is the average.

    The local reality is more extreme.

    The IEA notes that data centers have already surpassed 10% of electricity consumption in at least five US states. In Ireland, data centers account for more than 20% of electricity consumption. Denmark projects data center electricity use could rise sixfold by 2030 and approach 15% of national electricity consumption.

    This is the important distinction: globally, data centers are still a manageable share of electricity. Locally, they can become one of the dominant loads on the system.

    Goldman Sachs Research estimates data center power demand could grow 160% by 2030, with global data centers rising from roughly 1-2% of power consumption today to 3-4% by the end of the decade. It also estimates AI could add around 200 TWh per year of data center power demand between 2023 and 2030.

    Two hundred TWh is not abstract. It is close to the annual electricity consumption of a mid-sized industrial country. And it is only the AI-related increment in one forecast.

    The backlash is already here

    Chart comparing global data center electricity share with US, EU, Ireland and local grid impacts
    Global averages hide local pressure: data centers can reach double-digit shares of electricity demand in specific regions.

    This is no longer theoretical.

    In May, several local flashpoints showed the political side of the bottleneck. Seattle was weighing a pause on large data centers. Durham, North Carolina passed a 60-day moratorium on data-center development. A Texas county paused data-center construction in rural areas for a year. Utah approved a data-center project described as twice the size of Manhattan, triggering backlash. Tennessee was considering legislation that would let data centers self-power with limited regulation.

    Different places, same pattern.

    AI infrastructure is colliding with local politics. Communities are asking who gets the jobs, who pays for grid upgrades, who carries water risk, who absorbs noise and land-use impact, and who benefits from the compute.

    This is the part of the AI story many executives still underestimate. It is not enough to have GPUs. You need permission. You need interconnection. You need credible energy sourcing. You need community acceptance.

    The future of AI may be decided as much in planning boards and utility queues as in model labs.

    Why energy is now part of AI leadership

    Executive checklist for AI energy strategy and infrastructure planning
    AI energy strategy is now an executive checklist: economics, thresholds, model allocation, partnerships, and efficiency.

    For a long time, digital leaders could assume infrastructure would scale behind the scenes. Cloud abstracted away servers. SaaS abstracted away operations. Developers increasingly acted as if compute was infinite, elastic, and mostly someone else's problem.

    AI breaks that illusion.

    Training frontier models is energy-intensive. Inference at scale may matter even more because successful AI products are used continuously. Agents add another multiplier: they do not just answer one prompt. They plan, call tools, retry, search, generate, check, and act. A single user request can become dozens or hundreds of model calls behind the scenes.

    That makes energy not just an engineering issue but a leadership issue.

    If AI becomes a core production layer, power becomes part of product economics. Latency becomes part of geography. Energy procurement becomes part of risk management. Infrastructure partnerships become part of market entry. Sustainability claims become harder to defend if absolute consumption rises faster than efficiency improves.

    The better question is not whether AI uses "too much" energy.

    The better question is: are we using scarce energy for high-value intelligence, or are we wasting it on low-value automation theatre?

    The opportunity

    The upside is enormous.

    AI can help design better grids, forecast demand, optimize industrial processes, improve cooling, accelerate materials science, reduce waste, and make energy systems more flexible. The same technology that increases electricity demand can also improve how electricity is produced, routed, stored, and consumed.

    There is also a market opportunity.

    Companies that solve the infrastructure layer will not just be suppliers to AI. They will become strategic gatekeepers. Power developers, grid operators, data-center builders, cooling specialists, chip designers, construction firms, nuclear developers, storage providers, and energy software companies are moving closer to the center of the AI economy.

    This is especially relevant for Europe.

    Europe often frames AI competitiveness around regulation, foundation models, sovereignty, and talent. All matter. But infrastructure sovereignty may become just as important. If compute depends on power availability, grid speed, and data-center capacity, then AI sovereignty is partly electricity sovereignty.

    A European AI strategy without an energy strategy is incomplete.

    The space question

    Conceptual space-based AI data center with solar arrays orbiting above Earth
    Space-based data centers are not a near-term replacement for terrestrial infrastructure. They are a signal that the AI compute curve is pushing beyond the grid.

    The more provocative version of this debate is space.

    A few years ago, data centers in orbit sounded like science fiction. Now Bloomberg is writing about how to build them. McKinsey has made the case for space-based data centers. University researchers are exploring the idea because AI energy demand is rising. Google and SpaceX have been linked in recent coverage to the broader possibility of AI data centers in space.

    The attraction is obvious: continuous solar power, less terrestrial land pressure, potentially easier cooling through radiative systems, and the strategic appeal of moving part of the compute layer off Earth.

    The problems are just as obvious: launch cost, maintenance, radiation, latency, orbital debris, security, regulation, and basic economics.

    But the fact that serious people are asking the question matters. Space data centers are not a near-term replacement for terrestrial infrastructure. They are a signal. The AI compute curve is steep enough that people are looking beyond the grid.

    When a technology forces executives to ask whether the data center belongs in orbit, something fundamental has changed.

    What leaders should do now

    The call to action is practical.

    First: put energy into the AI business case. Every serious AI initiative should have a compute and energy view, not just a model and vendor view. If the project scales 10x or 100x, what happens to cost, latency, emissions, and capacity?

    Second: use real thresholds. A 10 MW workload is a large facility. A 100 MW workload is industrial infrastructure. A 1 GW workload is a regional energy strategy. Treat them differently.

    Third: separate high-value intelligence from low-value automation. Not every workflow deserves heavy AI. Use frontier models where judgment, ambiguity, and leverage justify the cost. Use smaller models, retrieval, caching, rules, and process redesign where they are enough.

    Fourth: make infrastructure a board-level topic. If AI is strategic, then power supply, data-center capacity, cloud concentration, and sustainability are strategic. CIOs, CTOs, CFOs, COOs, and sustainability leaders need one shared view.

    Fifth: build partnerships beyond software. The AI stack now reaches into energy markets, utilities, real estate, cooling, semiconductors, construction, public policy, and eventually maybe space.

    The leadership shift

    The first AI leadership question was: "What can this technology do?"

    The second was: "How does it change work?"

    The third is now emerging: "What does it require from the physical world?"

    This is where the debate becomes more serious.

    AI is not just a software wave. It is a capital investment wave, an energy demand wave, and an infrastructure coordination problem. The limiting factor may not be imagination. It may be megawatts.

    Executives should not panic about that. But they should stop treating it as somebody else's problem.

    Models matter.

    But electricity decides where the models can run. And if the curve continues, the strategic question may become even stranger:

    How much intelligence can Earth afford to host?

    Sources and further reading

  • EU AI Act delay: 24 months for Brussels, 64× for AI

    EU AI Act delay: 24 months for Brussels, 64× for AI

    For the EU, it’s 24 months. For AI, it’s 64×.

    Last Wednesday the EU pushed the AI Act’s hardest deadlines back. Sixteen months for one piece. Twenty-four months for another. Read in regulatory time, that’s a reasonable phased rollout. Read against AI’s own pace of change, it’s something different.

    Exponential curve labeled 1× at Aug 2026 rising to 64× at Aug 2028, headline reads 'When the rules apply, AI is 64× more capable', subtitle 'EU AI Act high-risk deadline vs the AI doubling curve'.
    When the EU’s heaviest AI rules finally apply in 2028, the systems being regulated could be 64× more capable than the ones the rulebook was written for.

    What the EU just decided

    The AI Act is the world’s most demanding rulebook for artificial intelligence. It applies to any company that sells AI to European users — based in Europe or not. It was passed in 2024. Most of it was supposed to start applying in August 2026.

    Last Wednesday, the Council and Parliament agreed to push two of the heaviest pieces back.

    The “high-risk” category is the part most companies care about. It covers biometrics, hiring software, medical AI, AI in critical infrastructure — anything where a bad model output can hurt someone. Under the old timeline, these systems had to be fully compliant by August 2026. Under the new timeline, that becomes December 2027 (sixteen months later) for standalone systems, or August 2028 (twenty-four months later) for AI built into machinery, medical devices, and connected cars.

    Two-column comparison: BEFORE shows a single Aug 2026 deadline bar in grey, AFTER shows two new bars Dec 2027 plus 16 months and Aug 2028 plus 24 months in navy.
    The May 7, 2026 simplification agreement: one August 2026 deadline becomes two later deadlines, sixteen and twenty-four months out.

    What didn’t change matters too. The outright bans (social scoring, manipulative AI, untargeted face scraping) have been live since February 2025. The rules for big AI models — what most people call “frontier AI” — have been live since August 2025. The transparency obligations actually got tighter: providers of generative AI now have three months instead of six to ship watermarking. And a new ban on non-consensual sexual deepfakes lands hard on 2 December 2026.

    So the substance is intact. The triage is on the timeline.


    What METR actually measures

    METR is a research group that measures one specific thing about AI systems: how long they can keep working on a task before the workflow falls apart. Not how smart they are. Not how creative. How long they can stay on track without a human stepping in.

    The way they test it is straightforward. Give a model a real-world task — write a piece of code, run an analysis, debug a system — and measure the time-equivalent of work it can complete on its own. GPT-2 could chain together a few seconds of useful work. Claude 3 Opus held a few minutes. The frontier 2026 generation pushes past an hour.

    Plotted against time, that line is a clean exponential. From 2024 through early 2026, the time-horizon roughly doubled every four months.

    Exponential curve with three points: GPT-2 seconds at lower left, Claude 3 Opus minutes in middle, Frontier 2026 over an hour at upper right, headline 'Doubles every ~4 months', source METR.
    METR’s measurement of how long AI systems can work autonomously. The horizon roughly doubled every four months from 2024 through early 2026.

    Other measures point the same way. Reasoning depth, tool use, multi-step planning, software-engineering benchmarks — every adjacent curve has bent the same way over the same window. METR’s number is the cleanest single proxy I’ve seen, but it’s not an outlier.


    What 64× actually means

    If the doubling holds, the math on the EU’s new deadlines is uncomfortable:

    1. 16 months — four doublings — 16× more capable systems by the December 2027 deadline
    2. 24 months — six doublings — 64× more capable systems by the August 2028 deadline

    64× is not a metaphor. It’s the order-of-magnitude estimate of how much more autonomous task length AI can sustain by the time the EU’s heaviest rules apply.

    To put that in plain terms: if a 2026 model can do a one-hour task on its own, a 2028 model on the same trend can do a 64-hour task. A system that holds a workflow together for 64 hours is a different kind of object than the one the AI Act was drafted to regulate.

    That’s not an argument the rules are wrong. It’s an argument the gap between what the rulebook describes and what the system can actually do widens fast — faster than any 2-3 year drafting cycle can keep up with.


    My read

    My read on this: the headlines called May 7 a Brussels cave to industry pressure. I don’t think that’s the right frame. The substance of the Act is intact — the Commission could have used the simplification to weaken the high-risk classification or gut the impact-assessment requirement. They didn’t. They tightened transparency and added a new prohibition. The triage is on the timeline, not the rules.

    By 2028, the AI Act could be regulating systems 64× more capable than what existed when its rules were written.

    My expectation is that the August 2026 cliff was always going to slip. What’s more interesting is what the slip exposes: regulators and AI now run on incompatible clocks, and there’s no obvious mechanism to reconcile them. The Act assumed a 2-3 year drafting cycle would land on systems recognisably similar to the ones it described. That assumption broke somewhere between GPT-4 and the agentic generation that followed.


    Three things I’m watching

    • The 2 August 2026 deadline for national authorities. That date didn’t move. If most countries still don’t have working AI authorities by August, December 2027 becomes the next deadline at risk.
    • The European technical standards. Without finalised standards from the standards bodies, “high-risk” is a definition without a benchmark. Whether the Commission publishes them before the new deadline is the gating item.
    • The EU-US-UK divergence. The same week the EU softened its timeline, the US signed pre-launch testing agreements with the five frontier labs through CAISI. These two regulatory paths now point in different directions, and that gap is where the next year of this story plays out.

    One last thought

    Sixteen months. Twenty-four months. In any other regulatory context, those numbers feel reasonable. In AI they feel like an era. That’s not a problem the Commission can solve in a single omnibus.

    To be clear I am not asking for more regulation, I am asking for more decision speed!

  • AI: creating or destroying jobs?

    AI: creating or destroying jobs?

    The AI-jobs argument has split into two camps that aren’t actually arguing about the same thing.

    Jensen Huang told CEOs at GTC that firing people for AI shows “no imagination” — radiologists, he points out, are more numerous now than before AI entered radiology. Marc Andreessen calls the displacement narrative “completely fabricated” and points to Jevons Paradox: cheaper labor produces more demand, not less. The WEF Future of Jobs Report still projects net +78 million jobs globally by 2030. Challenger’s Hiring Plans index was up 157% year-over-year in March.

    A week later, Block laid off 40% of its workforce. Jack Dorsey said engineering work that needed weeks now happens in a fraction of the time. Block is still hiring AI engineers.

    So which is it?

    My read: both sides are right. They’re answering different questions about different decades. Most of the public argument is two conversations pretending to be one.

    The optimist case

    Three pieces hold it up.

    The historical record is strong. Keynes wrote in 1930 that his grandchildren would work fifteen-hour weeks. Reality 2025: OECD average is thirty-seven hours, Americans clock 1,976 hours a year. Mechanization, electrification, the computer, the internet — every general-purpose technology was forecast to end work, and every one produced more jobs than it eliminated. In 1900, 41% of Americans worked in agriculture; today it’s 2%. The jobs went somewhere.

    Jevons Paradox is real. When something useful gets cheaper, demand rises. If AI makes cognitive work twenty times cheaper, you don’t end up with one-twentieth the cognitive work. You end up with twenty times the cognitive work, deployed against far more problems. Andreessen’s “Super-PhD in every field” captures it.

    A big chunk of the labor market is hard to displace. Licensed jobs (medicine, law, accounting), unionized jobs (skilled trades, transit, public safety), and public-sector roles add up to a large fraction of US employment. Not protected because they’re irreplaceable in some technical sense — protected by institutions that move slowly.

    Each piece is correct. The question is whether they’re enough.

    Where the optimist case breaks

    Radar chart of AI capability versus observed usage across eight occupations from the Anthropic Economic Index, showing the deployment gap.
    The deployment gap: theoretical AI capability dwarfs observed usage by occupation. Source: Anthropic Economic Index.

    The Anthropic Economic Index plots theoretical AI capability against observed AI usage by occupation. The two lines look almost nothing alike — capability is broad and high; usage is narrow and concentrated. There’s a gap between what AI can do and what it’s actually doing.

    Read that gap two ways. The optimist reading: deployment is slow, friction is real, the labor market reabsorbs shocks like it always has. The harder reading: the gap is the queue — it’s where displacement comes from over the next five to ten years, not from new capability but from deployment catching up to capability that already exists.

    94% of cognitive job tasks are theoretically automatable today; 33% actually are. The space between is the transition zone. It’s not science fiction. It’s not contested. Most of it will close. Block’s layoffs sit on the second reading.

    The historical-record argument also has a footnote that doesn’t get enough weight. AI is the first general-purpose technology to automate cognitive labor at scale. Every prior wave automated muscle, then narrow categories of cognitive work — but never the universal category of “thinking and writing and analyzing and deciding.” The tractor displaced farm hands; they moved into office work. The PC displaced typewriters and clerks; they moved into knowledge work. AI doesn’t have an obvious “moved into” destination, because the destination of every prior wave is the category AI now automates.

    The TIME / Contextual AI benchmark chart makes the universality vivid. AI surpassed human-level performance on handwriting recognition around 2015, then speech, then images, reading, language, common sense, math, code generation. The rate at which new tasks fall is increasing.

    The trades-and-physical-work counterargument is weaker than it looks. Yes, 57% of jobs depend on physical presence or craft work AI can’t currently replicate. But 70% of positions inside blue-collar companies — the dispatcher, the accountant, the customer-service rep — are white-collar-adjacent and fully exposed. And if displaced knowledge workers all migrate into trades, wages collapse from saturation. Bank of America projects billions of humanoid robots by mid-century with hardware costs falling from $35,000 to under $15,000; one analyst projects robot-hours at four to six euros. Even physical work has an expiration date.

    So the optimist case is strong for a long-run answer. It’s much weaker for the next ten years.

    The displacement case

    Not “AI replaces all jobs.” That’s the optimists’ caricature, and once you reach for it the displacement case looks weak. The serious version is more specific.

    Three vertical bars on dark navy: high-skill rising, middle-skill shrinking with downward arrow, low-skill stable — the AI barbell economy.
    The barbell economy: high-skill productivity rises, low-skill stable, the middle hollows out.

    It’s structural: the middle is being squeezed. The labor market is shifting from a K-shape into a barbell. High-skill technical roles are more productive — the same Anthropic data shows code, analysis, and research at the top of the productivity-gain distribution, with usage approaching 60% of theoretical capacity. Low-skill physical roles in care, hospitality, manual handling, and trades are stable for now. The middle is shrinking: bookkeeping and paralegal work, content writing and copywriting, junior finance and analyst roles, customer service, entry-level coding, marketing copy, translation, project coordination, junior tax preparation.

    Germany has already seen roughly 90,000 AI-related job losses in the first months of 2026. The risk is not mass unemployment in aggregate. Aggregate unemployment can stay low for years while the middle hollows out. The risk is a split labor market — and a split society — in which the people who staffed the middle no longer have a clear path up or sideways.

    The Anthropic Economic Index BLS panel makes this concrete: hiring of younger workers in AI-exposed occupations has slowed, even as overall employment numbers haven’t moved much. That’s what early-stage hollowing looks like — the entry-level rung disappears first, before the established middle does.

    Five years of that compounds into something the historical record didn’t have to absorb.

    My read — the three-phase shape

    Horizontal timeline 2025 to 2040+ split into three colored zones: red displacement, amber strain, cyan abundance — AI jobs transition phases.
    Three phases of the AI jobs transition: displacement (2025-2030), strain (2030-2035), abundance (2035+).

    The clearest three-phase framing is German — chronological, not parallel.

    Phase one — displacement (~2025-2030). AI displaces knowledge work faster than the labor market rebuilds. The middle hollows. Aggregate unemployment may not move much; entry-level paths in white-collar roles narrow sharply. The optimists are right that the technology eventually creates new categories. They’re wrong about the timing.

    Phase two — strain (~2030-2035). Strain shows up in places that aren’t unemployment: tax-base erosion, weakened consumer demand, capital returns rising while labor’s share of national income falls to historic lows. Public-sector and licensed-job cushions hold initially but come under fiscal pressure. The political consequences sharpen.

    Phase three — abundance (after ~2035). The deflation the optimists describe arrives. Costs collapse across categories. What costs $100 today costs a few cents. The median 2040 lifestyle, on a flow-of-services basis, looks something like today’s high-net-worth lifestyle on every dimension except positional goods. Both Andreessen and Huang are right about the destination.

    That’s the timeframe trap. Both sides are correct on their respective horizons. The honest version of the optimist case includes the transition pain. The honest version of the displacement case includes the recovery.

    What this means for how leaders think about the next ten years: the question isn’t “do we believe in AI displacement, yes or no.” That question is roughly answered. The task is to assume real displacement in the middle, plan for it, and carry the organization through to the recovery in a way that keeps the institution and its people whole.

    Three things I’m watching

    1. Whether the entry-level signal becomes a leading indicator. The slowing of hiring for younger workers in AI-exposed occupations is, in my read, the most important early signal. Aggregate employment numbers lag; entry-level absorption leads. If the slowdown becomes a structural break, phase one stops being a forecast and becomes a measurement.
    2. Whether the licensed and public-sector cushion holds when fiscal space tightens. The structural-protection argument is strong only as long as the institutions that protect those jobs don’t themselves come under fiscal pressure. Phase two erodes the tax base. The question is whether legislatures and regulators are protecting genuinely-essential public-sector employment or post-hoc subsidizing the share of the workforce the private sector can no longer place.
    3. Whether the recovery looks like restored employment or restored income. Phase three is consistent with both. Jobs come back in new categories — the historical track record. Or they don’t come back at scale and the recovery is income-shaped: UBI-like distribution of the deflation surplus rather than wage-based participation. These look very different politically. The shape of phase two is what determines which one we get.

    No one has confidence on these three questions yet. I’m watching them because the answers will tell us, in roughly the next five years, what the transition phase actually costs.

    The destination is not in serious doubt. The road is.

  • MIT Called It a Disenchanted Intern. METR Says Check the Growth Rate.

    MIT Called It a Disenchanted Intern. METR Says Check the Growth Rate.

    Something happened this week that I keep turning over.

    MIT published findings this month showing that when 41 AI models were tested across more than 11,000 real workplace tasks, the result was, in their words, like a “disenchanted intern” — hitting minimum benchmarks about 65% of the time, but never exceeding 50% success on tasks requiring genuinely superior-quality output. If you work in software, marketing, legal services, or knowledge work of any kind, that’s the snapshot.

    METR — a nonprofit focused on measuring AI capabilities — published a different kind of snapshot. Their metric is the “time horizon”: the maximum length of autonomous task a frontier AI can reliably complete. In 2019, the best AI could handle roughly a two-minute task without human intervention. By the end of 2025, that had grown to roughly an hour. The doubling time across that whole period: around seven months.

    METR’s January 2026 update tightened that number further. Post-2023, the best estimate for the doubling period is now 130 days — closer to four months.

    My read on this:

    The MIT study and the METR data aren’t in conflict. They’re measuring different things at different timescales. MIT is taking a photograph. METR is measuring the shutter speed. And the shutter speed is getting faster.

    I don’t think the “disenchanted intern” framing is wrong — it describes today accurately. What I’m less sure about is the assumption, implicit in most of the coverage I’ve read this week, that “today” is a stable state. An intern who gets twice as capable every four months is not the same resource at the end of the year as they are today.

    What I keep returning to is the gap between the current snapshot and the trajectory — and the opportunity that opens up in that gap. The MIT data is a photograph of now. The METR data is the shutter speed. Anyone building workflows, designing teams, or structuring how they work around AI capability today is working from a reference point that will be measurably out of date within a single planning cycle. That’s an opportunity signal at a scale and pace most planning assumptions don’t account for.

    Three things I’m watching:

    1. Where the doubling curve hits friction. Every exponential eventually meets a wall — physical limits, data constraints, regulatory friction. METR’s time-horizon metric is useful precisely because it measures real-world task completion, not synthetic benchmark scores. When the doubling cadence breaks, that will be the signal that the curve has met something real. I expect that to happen. I just don’t know when.

    2. Whether “minimally sufficient” matters or not. MIT’s 65% minimally sufficient rate sounds modest. But most enterprise workflows run on people who are minimally sufficient most of the time. The threshold isn’t excellence — it’s “acceptable at scale, around the clock, at near-zero marginal cost.” That bar is lower than it sounds, and closer than the headline number implies.

    3. The infrastructure spend as an access unlock. Alphabet, Meta, Microsoft, and Amazon are projected to spend nearly $700 billion combined on AI infrastructure in 2026 — roughly double what they spent last year. That capital isn’t just building capacity for the current snapshot. It’s funding the cost compression that makes the next several capability doublings broadly accessible. When the infrastructure matures, the cost floor drops — and the surface area for building on top of it expands with it.

    The disenchanted intern framing is apt today. My expectation is that it’s a better description of 2025 than it is of 2027.

    References

  • Every knowledge worker is a manager now

    Every knowledge worker is a manager now

    Every knowledge worker is a manager now. Agentic AI has turned individual contributors into managers of AI agents, and first-line managers into leaders of managers of agents. The job descriptions have not caught up yet. The operating models have not caught up yet. The reskilling plans have not caught up yet. All of that is lagging the capability frontier by twelve to eighteen months — and the organizations that close that gap first will operate at a structurally different throughput than the ones still writing job descriptions for the jobs that existed in 2023.

    The shift: agentic AI crosses the line from tool to colleague

    For the first year and a half after ChatGPT, the thing called “AI” in most organizations was a better search box. A more patient editor. A faster rough-draft generator. Useful, but still a single-interaction tool. You asked, it answered, you moved on. The job of the knowledge worker did not fundamentally change — they just had a slightly sharper pencil.

    What changed in the eighteen months leading into 2026 is the arrival of agentic models. The word “agent” in that context is not marketing. An agent is a system that can do a sequence of things, hold state across those steps, make decisions about what to do next, use tools, and come back with a completed multi-step task. That is a categorically different interaction than “ask question, get answer.” It is closer to “give a junior colleague an outcome to produce and trust them to produce it.” The commercial consequence of that shift is the subject of this post.

    Knowledge-worker image candidate K02-HC-pipeline: HC2 — INPUT-AGENT-OUTPUT-JUDGE-SHIP pipeline with human at JUDGE
    Input → agent → output → judge → ship. The human stays at the judgment node.

    The role change: ICs become managers of agents

    The individual contributor job has silently changed. Writing short summaries of long content — once a junior-to-mid task — is now an agent task. The human role is to specify the outcome, check the output, and decide what to do with it. Meeting preparation — the pre-meeting brief of background, context, attendees, prior touchpoints — is now an agent task. The human role is to feed the context, review the brief, and adjust the framing. Drafting a first pass of almost any structured document — a proposal, a plan, an analysis — is now an agent task. The human role is the editor, not the author of the first draft.

    The common thread is that the IC’s job has shifted from doing to specifying outcomes and judging output. Those are management skills. Not in the metaphorical sense — in the literal sense. Framing a task clearly enough that someone (or something) else can execute it. Evaluating whether the execution meets the specification. Deciding when to iterate and when to ship. These are exactly the skills that used to distinguish a first-line manager from a senior IC, and they have become baseline requirements for an IC working with agents.

    Knowledge-worker image candidate K03-HC-editor: HC3 — colleagues editing agent outputs + overlay text
    The new role for the IC: editor of agent output.

    The org change: first-line managers become leaders of managers of agents

    If every IC is now a manager of agents, then every first-line manager is now a leader of managers of agents. Their job is no longer to supervise execution — the agent is doing the execution. Their job is to coach the humans on their team in how to specify outcomes, how to judge output, how to know when an agent is producing garbage, and how to scale their orchestration over time. That is a completely different job than the first-line management job of three years ago, and it requires a different skill set.

    Two structural consequences follow. First, the middle management layer compresses because a first-line manager leading managers-of-agents can reach further than one managing direct executors — the coordination overhead per report drops when the reports are themselves operating on a multiplier. Second, the definition of “span of control” stretches, but not infinitely: the Dunbar layers still govern the number of humans a manager can hold relationships with, even if each of those humans is now operating agents underneath them. The org chart can get flatter. It cannot get unbounded.

    Knowledge-worker image candidate K05-WILD-conductor: WILD — human conductor directs an orchestra of AI agents
    One human, many agents — the conductor metaphor for first-line management at scale.

    The strategic consequence: orchestration is now a baseline skill, not an advanced one

    The skill that used to distinguish senior managers from junior ones — the ability to frame work so someone else can execute it and judge whether their execution is good — is now a baseline IC capability. Orchestration is the new baseline. Writing is the new baseline. Judgment about output quality is the new baseline. The organizations that will operate at structurally higher throughput over the next five years are the ones that reskill their IC population around these baseline orchestration skills, rather than hiring more specialists who each do one thing well.

    Talent leverage, not headcount, becomes the scoreboard. A commercial organization that operates at 300 humans with strong orchestration capability can outproduce a commercial organization that operates at 600 humans with legacy IC job descriptions. The difference is not about working harder. It is about operating model. The 300-human organization has fewer Dunbar breakpoints, shorter decision loops, less cross-functional friction, and a higher per-seat agent-multiplier. All of that is the consequence of a single structural decision made at the job-description layer.

    So what boards should do

    Three actions sit on the CEO agenda over the next two quarters. First, rewrite the IC job descriptions for every knowledge-worker role in the organization so that orchestration and output judgment are explicit baseline capabilities, not bonus ones. Second, rewrite the first-line management job description so that coaching for orchestration is the core of the role, not supervision of execution. Third, audit the reskilling plan against the assumption that every knowledge worker in the organization is now a manager and needs to be trained as one — because the capability frontier has already shipped and the only question is whether the organization catches up in quarters or in years.

    Boards that do not require a reskilling plan at this scope are budgeting against an operating model that does not exist anymore. The plan does not need to be perfect. It needs to exist. The gap between organizations that have this plan and organizations that do not is the structural competitive advantage of the next five years, and it is already being measured — in throughput, in decision velocity, in the quiet retention of the top performers who can see the gap coming.

  • Inference cost has collapsed. Enterprise AI business cases haven’t caught up.

    Inference cost has collapsed. Enterprise AI business cases haven’t caught up.

    GPT-4 class inference cost $20 per million tokens at launch in early 2023. In April 2026, equivalent performance runs $0.40. Most enterprise AI business cases were built somewhere in the middle — and haven’t been updated since.

    That gap is not a technology story. It is an arithmetic problem wearing a strategy hat.

    What moved

    Inference costs have declined faster than the bandwidth price collapse of the early internet era, faster than PC compute, and considerably faster than any enterprise finance model anticipated. Artificial Analysis tracks it live: the cheapest capable models today run under $0.50 per million tokens. A flagship model that cost $10 per million tokens eighteen months ago now costs $2–3. The price range between the cheapest and most expensive capable options has widened past a thousand-to-one.

    The driver is compounding. Better training efficiency produced more capable models at lower operating cost. Competition between providers accelerated the pass-through. Specialised chips entered the stack. The result: a cost curve that looks less like traditional software pricing and more like solar panel economics — each year’s curve is below where last year’s curve said it would be.

    What did not move

    Enterprise AI business cases.

    S&P Global found that 42% of companies abandoned most of their AI projects in 2025. Cost and unclear value were the top reasons cited. IBM put the share of AI initiatives delivering expected ROI at 25%. MIT found that 95% of AI pilots delivered zero measurable P&L impact (MIT NANDA, State of AI in Business, 2025).

    These numbers are real. But the interpretation of why projects fail is often imprecise.

    Projects approved in 2023 and 2024 were scoped against the pricing environment of 2023 and 2024. The cost models that informed the go/no-go decisions used token prices that no longer exist. The ROI denominators were anchored to infrastructure assumptions from a period when GPT-4 access cost $10–20 per million tokens. The business cases that were rejected on cost grounds — the ones that landed below the internal ROI hurdle by a thin margin — were rejected against a cost basis that is now a fraction of what it was.

    That is not a technology failure. It is a modeling lag.

    Andreas’s view

    My read on this: there are two different things getting conflated in the ROI conversation. One is genuinely poor outcomes — wrong use case, shallow integration, insufficient change management. That is real and deserves scrutiny. The other is a systematic understatement of AI’s economic potential because the cost assumptions in the business case never got refreshed. Those two phenomena look identical in the data.

    I don’t think the 42% abandonment rate or the 25% ROI hit rate tells us much about what AI can do at today’s prices. It tells us how enterprises perform against business cases built on 2023 assumptions. The projects that got killed for cost reasons in Q4 2024 would look different rerun against Q2 2026 pricing.

    My expectation is that the organisations getting ahead of this are running a specific exercise that most are not: taking the cost assumptions out of every AI initiative that was rejected or stalled in 2023–2025, replacing them with current market rates, and seeing which cases cross the ROI threshold now. Not all of them will. But some will — and the decision to revisit them is a spreadsheet exercise, not a technology project.

    Three things I’m watching:

    • Whether finance teams are treating inference cost as a stable input or a variable. Most enterprise budget models treat infrastructure cost as a constant. Inference cost is not a constant — it has been declining faster than almost any other enterprise input cost in the last three years.
    • The spread between unit cost and total spend. Per-token costs have collapsed, but total enterprise AI spend is forecast to jump 65% in 2026 — from roughly $7M average to over $11M (IDC). Volume is expanding faster than unit costs are falling. The budget impact of AI is still growing, even as the underlying unit economics are dramatically more favourable than they were.
    • How capital allocation committees handle the remodel request. The institutional question: if a CFO approved a 2023 AI business case that underperformed, how does the organisation handle finance coming back and saying “the cost structure changed — the case should have worked, we just used the wrong numbers”? That conversation is coming.

    What this reveals

    The collapse in inference cost is well-understood in developer circles. Engineers who run inference workloads reset their unit economics continuously — it is operational reality. The delay is in the enterprise business case layer, where cost assumptions travel up through approval chains, get embedded in multi-year plans, and calcify.

    The cost curve does not care about the approval cycle. It moved while the slide decks were in review.

    This is not an argument that all AI investments look better at current pricing — some of those failed pilots would have failed regardless, and the organisational conditions for AI success (clear scope, embedded workflows, meaningful accountability) have not gotten easier. But a non-trivial fraction of the projects that stalled on cost now live in territory where the math is different. Identifying them is a shorter path to AI ROI than starting new initiatives from scratch.

  • Model deprecation is the new continuity risk

    Model deprecation is the new continuity risk

    Four rectangles in a row with the leftmost ghosted, simple connecting arrows
    A — model lifecycle row.

    OpenAI announced the discontinuation of the Sora web and app experiences on April 26, with the Sora API following on September 24. The first deprecation triggers in two weeks. Enterprises that built workflows on Sora since launch are not facing a model upgrade — they are facing a workflow rebuild on a four-month timeline. This is the first prominent enterprise-facing AI deprecation event of the cycle, and the precedent it sets matters more than the specific product involved.

    Model deprecation is no longer a developer-tier concern. It is an enterprise governance question that deserves a place on the risk committee agenda. The real shift is happening here: AI dependency without continuity is becoming a board-level risk in 2026.

    The shift: dependency without continuity guarantees

    The pattern of the past two years has been to build agent workflows on whichever foundation model was demonstrably best at the time, with little contractual commitment from the model provider about how long that model would remain available. Provider terms have improved — Azure OpenAI’s twelve-plus-six-month commitment for generally available models is the strongest standard in market — but most enterprises have not negotiated equivalent terms with their chosen providers. They built on capability, not on continuity.

    When the provider sunsets the model, the enterprise’s options are bad. Migrate to a successor model that may behave differently in subtle ways — requiring re-validation of every governed use case. Renegotiate at the eleventh hour for extended access at unfavorable terms. Or absorb the operational disruption of the workflow simply not working until rebuilt.

    The Sora event is small in dollar terms but large in precedent. The next deprecation will involve a more enterprise-critical model, and the enterprises that did not see this one coming are not going to see that one coming either.

    A single thread connecting a workflow box to a model box, the thread visibly fraying near the model with a clock above
    Built on capability. Not on continuity.

    The role change is the addition of an AI continuity discipline

    Inside enterprises that take this seriously, a discipline is emerging that did not exist in 2024 — AI continuity management. The work overlaps with vendor management, with disaster recovery, with model risk management, and with regulatory compliance, but it is structurally distinct from all of them. The discipline involves maintaining an inventory of model dependencies by workflow, negotiating continuity commitments at procurement, running successor-model regression tests on a regular cadence, and ensuring that the documentation chain meets the rebuild-readiness standard.

    Most enterprises have not staffed this discipline. The accountabilities are scattered across teams that do not coordinate. The procurement team negotiated the model contract a year ago without a continuity clause. The deployment team is building production dependencies on the model without thinking about migration cost. The risk team has not flagged model deprecation as a category. When the deprecation announcement lands, the company finds out it has no plan.

    The fix is straightforward in concept and slow in practice. Add continuity commitments to the procurement template. Build a model-dependency inventory. Designate an owner for AI continuity at the executive level. Run quarterly successor-model tests. None of this is hard. It is just unglamorous work that does not get done unless someone owns it.

    The strategic consequence is renewed buy-versus-build math

    Continuity risk changes the calculus of where to deploy AI capability. For workflows where the cost of unplanned migration is high — regulated workflows, mission-critical operations, customer-facing experiences with high switching costs — the case for either fine-tuning a frontier model into a controlled deployment, partnering with a vendor offering enterprise-grade continuity commitments, or building on open-weight models the enterprise can host indefinitely is stronger than it was in 2024. The case for relying on whichever model is best on a benchmark this quarter is weaker.

    The math is not simple. Open-weight models lag the frontier, sometimes meaningfully. Self-hosting carries operational cost that the proprietary providers absorb. The vendor lock-in to a single proprietary provider, even with the best continuity terms, is a different kind of risk than open-weight self-hosting carries. Each enterprise has to make this trade-off based on the workflow’s tolerance for capability lag versus its tolerance for continuity disruption.

    What is no longer defensible in 2026 is treating model continuity as someone else’s problem. The Sora sunset is small. The next one will not be.

    So what boards should do this quarter

    Add model deprecation to the risk committee agenda. The first deprecation event lands in two weeks. The board should at minimum understand which workflows are exposed and what the migration plans are.

    Demand a model-dependency inventory. Which workflows depend on which models from which providers, with which contractual continuity commitments. If this inventory does not exist, building it is the priority.

    Reconsider the buy-versus-build posture for mission-critical AI workflows. The 2024 default — use whichever proprietary model is best — was rational at the time. In 2026, with the deprecation precedent now visible, that default deserves an explicit reconsideration. Continuity is becoming a form of resilience. The boards that price it in this quarter will not be the ones rebuilding workflows under deadline.

    References and links

  • Team sizes are not design choices. They’re cognitive limits.

    Team sizes are not design choices. They’re cognitive limits.

    Team sizes are not design choices. They are cognitive limits. The recurring numbers that show up in military units, religious communities, hunter-gatherer bands, and commercial organizations are not management philosophy. They are a property of the animal doing the work, and any organizational structure that pretends otherwise pays a measurable tax in friction, communication overhead, quiet attrition, and decisions that arrive three weeks late.

    Two. Four to six. Eight to twelve. Twenty to twenty-five. Fifty. One hundred and fifty. The specific numbers recur across centuries and industries. In the Roman legion and the US Marines. In religious communities and hunter-gatherer bands. In tech companies, sales organizations, and the advice experienced managers give each other about when to split a growing team. It is not a coincidence. It is cognitive architecture. The constraint is no longer technology. The constraint has always been the brain doing the coordinating.

    Dunbar’s layers

    The research most commercial leaders eventually bump into is Robin Dunbar’s. Dunbar is a British anthropologist who, in the early 1990s, proposed that the size of a primate’s social group is constrained by the size of its neocortex. Extrapolating from primate data, he estimated the human number at around 150 — the number of people with whom any one of us can maintain a stable, recognisable, mutually-active relationship. He published it in the Journal of Human Evolution in 1992, and the number has been running through management literature ever since.

    The part that gets talked about less, but matters more, is that Dunbar’s 150 is not a single flat layer. It is the outer ring of a nested set, each layer roughly three times larger than the one inside it:

    • ~5 — your closest support group. The people you would call in a real emergency.
    • ~15 — your sympathy group. People whose loss would significantly affect you.
    • ~50 — your band or clan. People you know well enough to share deep context with.
    • ~150 — your active community. Stable, recognisable, mutually reciprocal relationships.
    • ~500 — acquaintances.
    • ~1500 — faces you can still recognise.

    These layers show up in the research almost regardless of whether the subject is a tribal society, an office workforce, or a social-network friend graph. And they map astonishingly well onto the team sizes that commercial organizations stumble toward by trial and error — not because anyone read Dunbar, but because the alternatives don’t work.

    Round-G candidate G01-HC-editorial-figure: HC1 — central figure + concentric silhouette tiers (matches #4511 aesthetic)
    A central figure surrounded by expanding tiers — 5, 15, 50, 150.

    The military got there first

    Armies have been experimenting with how to organize humans under extreme stress for two thousand years, and they arrived at exactly these numbers through pure selection pressure. Smaller was too fragile. Larger fell apart under fire. The numbers that survived are the numbers that work.

    A Roman legion’s smallest unit was the contubernium — eight soldiers who shared a tent, a mule, a mess, and most of their waking life. Eight. Right at the boundary between the 5-person inner layer and the 15-person sympathy group. The Romans knew nothing about neocortex ratios. They noticed that a group of eight held together in a way that a group of four or a group of sixteen did not.

    The modern US Marine Corps fireteam is four. The squad is roughly 13. The platoon is 30 to 40. The company is 100 to 150. The same ratios, twenty-one centuries later. The cognitive limits haven’t moved, because the brain they are about hasn’t.

    The tech industry rediscovered the same numbers

    The technology industry discovered the same structure and gave it different names.

    Jeff Bezos’s two-pizza rule — a team should be small enough to be fed by two pizzas — is a practical restatement of the 5-to-8 cognitive sub-layer. Amazon did not get there via anthropology. They got there by watching their own product teams stall every time they grew past the point where the whole group could fit around one table.

    Scrum teams are officially 7 ± 2 — the current Scrum Guide recommends 3 to 9 members — which echoes George Miller’s 1956 paper on the working-memory limit of around seven chunks. Miller was not writing about teams. The cognitive limit he found on how many things we can juggle at once maps cleanly onto how many people we can coordinate without losing track of where everyone is.

    Fred Brooks, in his 1975 book The Mythical Man-Month, observed that adding people to a late software project makes it later, because every new person increases the number of pairwise communication channels by roughly n(n–1)/2. Seven people means 21 channels. Ten means 45. Fifteen means 105. The coordination tax is quadratic, and it surfaces as “mysterious” slowdowns at exactly the team sizes where the math stops being manageable.

    W. L. Gore & Associates, the Gore-Tex company, built Dunbar’s number directly into its real-estate strategy. Founder Bill Gore had a rule: every time a building exceeded 150 employees, they built another building. He was running Dunbar’s ceiling inside his facility planning decades before Dunbar had published the paper.

    The Ringelmann effect, documented in 1913 and one of the oldest findings in social psychology, is the same story in a different register: as group size grows, the effort each individual contributes goes down. People pull harder on a rope when there are fewer of them holding it. Max Ringelmann measured it with actual rope-pulling experiments, and the finding has been replicated many times since in workplace and sports settings.

    Nano Banana round-2 variant R07-c09-overlay-A: C09-A — two-pizza with overlay text
    The two-pizza team — Bezos’s practical statement of the cognitive sub-layer.

    The role change: the first-line manager span is a cognitive limit, not a cost line

    A first-line manager’s direct-report span is not a matter of preference for most cognitive work. It sits around 5 to 7. Push it to 10 and managers stop coaching and start triaging. Push it to 15 and the role has reverted to being an individual contributor with a different title. Organizations that scale cleanly keep that first layer tight even when the spreadsheet says it is expensive — because the spreadsheet is not pricing the coordination tax that a wider span produces downstream.

    Minimalist line graph showing communication-channel count rising quadratically as team size grows from 2 to 15
    Coordination overhead grows quadratically with team size.

    The org change: 50 and 150 are hard boundaries

    The sub-team that actually owns a piece of work should be closer to 5 than to 10. Not because small teams are faster in principle, but because the communication-overhead curve gets steep fast after 7. Bezos was right about this, and almost every high-performing team of any reasonable size runs its real work through an informal group of four or five — regardless of what the reporting structure says on the org chart.

    When a function crosses 50 people, it needs an operational substructure. Tribes, chapters, pods, whatever the label — or the Dunbar sympathy layer breaks. When the people in a team stop knowing each other well enough that a death in someone’s family would visibly register with everyone, culture starts dying quietly. By the time anyone notices, six months have usually been lost.

    When an organization crosses 150, it runs two cultures whether the leadership admits it or not. The question is only whether the split is designed deliberately or happens by default. Organizations that handle the ceiling well accept it and build deliberate boundaries. Organizations that handle it poorly spend years pretending 400 people are “all one team.”

    Minimalist org-chart diagram with a horizontal dashed line labeled 150 separating a large unified structure above from subdivided smaller groups below
    Cross 150 and you either build deliberate substructure or get default fragmentation.

    The strategic consequence: org design is surrender, not construction

    Good organizational design is mostly a process of surrender. The cognitive architecture of the humans running the teams picks team sizes for you, and the only real choice is whether to build the org chart around what actually works or to fight it and pay the tax. Every commercial organization that has tried to force a bigger number — a 12-person manager span, a 30-person “small team,” a 300-person “family culture” — has either quietly subdivided itself into groups that look suspiciously like the Dunbar numbers, or lost the thing that made it work.

    AI augmentation does not move the cognitive ceiling. It moves the throughput below the ceiling. An IC managing four AI agents is still operating inside a span of four. A manager coordinating seven sub-teams of augmented ICs is still operating inside a Dunbar-5 layer. The numbers that governed organizational design before agents are the numbers that will govern it after.

    Round-G candidate G03-SEMI-fireteam: SEMI — fireteam of 4 around laptop + overlay (matches two-pizza warmth)
    Small intimate teams stay where the work actually gets done.

    So what boards should do

    Boards should design operating models around the Dunbar layers and treat AI-augmented throughput as a multiplier on what each cognitive unit can do — not as a license to stretch the unit past its ceiling. The specific actions sit at four layers: first-line spans at 5 to 7 even under headcount pressure; sub-team ownership at 5; operational substructure at 50; deliberate cultural boundaries at 150. These are not target numbers. They are discovered numbers. Every other structure is an argument with biology, and biology does not negotiate.

    The Roman legions did not know about neocortex ratios. The Marines do not design their fireteams around anthropology papers. Jeff Bezos did not cite Dunbar when he ordered the pizzas. All three converged on the same numbers because the numbers are a property of the animal doing the work, not the work itself. The job of an organizational designer is to notice this — and then get out of the way.

    References

    • Dunbar, R. I. M. (1992). “Neocortex size as a constraint on group size in primates.” Journal of Human Evolution.
    • Dunbar, R. I. M. (2010). How Many Friends Does One Person Need? Harvard University Press.
    • Miller, G. A. (1956). “The Magical Number Seven, Plus or Minus Two.” Psychological Review.
    • Brooks, F. P. (1975). The Mythical Man-Month: Essays on Software Engineering.
    • Hackman, J. R. (2002). Leading Teams: Setting the Stage for Great Performances. Harvard Business School Press.
    • Ringelmann, M. (1913). Early social-loafing experiments, Annales de l’Institut National Agronomique.
    • Gladwell, M. (2000). The Tipping Point. Popularised Gore’s rule of 150 for the management audience.
    • The Scrum Guide — current recommended team size: 3 to 9 members.
  • Vertical AI is winning the deployment race

    Vertical AI is winning the deployment race

    Horizontal AI slab at the bottom with three taller vertical columns rising from it labeled by domain
    Horizontal is the substrate. Vertical is the value layer.

    Gartner’s April read says eighty percent of enterprises will have adopted at least one vertical AI agent by year-end, and thirty percent of all enterprise AI deployments will be vertical-specific. Bessemer’s vertical AI report from this month is even more direct: vertical AI companies founded after 2019 are reaching eighty percent of traditional SaaS contract values while growing four hundred percent year-over-year. This is not a minor adjustment to the deployment landscape. It is a structural redirection of where the value of agentic AI accrues.

    For boards in 2026, the implication is that the right framework for thinking about AI vendor strategy is no longer horizontal-versus-vertical. It is which verticals you bet on, and how early. Deployment speed defines advantage in this cycle, and the deployment race is now a vertical-by-vertical race.

    The shift: vertical specialization beats horizontal generality at the workflow layer

    Horizontal AI tools — the chat assistants, the general-purpose copilots, the broad productivity overlays — are still the largest category by usage. They are not the largest category by enterprise value. The reason is structural. A horizontal copilot is good at fifty things. A vertical agent is excellent at five things that are deeply embedded in a specific workflow.

    When the enterprise needs to extract value, depth wins over breadth. Abridge in clinical documentation. Harvey and EvenUp in legal. Hebbia in financial research. Specialized clinical-coding agents at major payers. The vertical players ship integrations into existing systems, understand the regulatory and accuracy constraints of the domain, and deliver outcomes that horizontal tools cannot match without significant configuration effort that customers refuse to undertake.

    The defensibility of vertical players is also higher than the market priced in 2024. The data flywheel inside a regulated vertical is genuinely hard to replicate. The customer relationships are stickier because switching costs include re-credentialing within the regulator’s expectations, not just re-implementing software.

    Two rectangle shapes side by side, one wide and shallow, the other narrow and deep
    Wide-shallow loses to narrow-deep at the workflow level.

    The role change is the chief AI buyer becomes a portfolio manager

    Inside enterprises, the executive responsible for AI vendor strategy is increasingly running a portfolio of vertical specialists alongside the foundation-model contracts. The horizontal tools form a substrate. The vertical agents form the high-value layer. The portfolio manager has to balance ROI realization against integration overhead, and has to decide which verticals to deepen versus which to defer.

    The skill set for this role is closer to portfolio investment management than to traditional procurement or IT leadership. The portfolio manager has to read product roadmaps, anticipate vendor consolidation, manage concentration risk, and time entry into emerging verticals where category leaders have not yet emerged. None of this is in the standard procurement or CIO playbook.

    Most large enterprises have not formally structured this role yet. The work is happening inside the CIO function or inside individual line-of-business AI initiatives, with no portfolio-level coordination. The result is double-procurement of overlapping vertical capability and missed early-mover advantage in verticals where the category leader will not stay reasonably priced for long.

    The strategic consequence reshapes acquisition strategy

    For enterprises in regulated industries — banks, insurers, hospital systems, large law firms, accounting firms — the vertical-AI thesis has a direct M&A implication. The category leaders in each vertical are trading at premium multiples now and will trade at higher multiples by 2027 once their data flywheels and customer concentrations are visible in audited financials. The window for acquisition at reasonable multiples is open in 2026 for most verticals. It will close.

    For incumbents who do not acquire, the implication is partnership at scale. The vertical specialists need distribution that incumbents already have. The incumbents need capability that the specialists already have. The deal terms will tilt toward the specialists as their growth rates remain visible. Incumbents that delay partnership decisions to 2027 will pay more for less favorable terms.

    For boards governing AI strategy, the directive question is whether the company is buying or building or partnering for vertical AI capability — and whether that decision is being made deliberately for each vertical, or by default by the absence of a decision. Default-by-absence is the mode most large enterprises are operating in. It is the most expensive mode.

    Four labeled doors in a corporate hallway with one chosen and three closed
    Per vertical: buy, partner, build, or wait — pick deliberately.

    So what boards should do this quarter

    Map the AI vendor portfolio with horizontal versus vertical breakdown. If the breakdown is more than two-thirds horizontal, the company is missing the value-creating layer. If it is unmapped, that is a more urgent finding.

    Designate an executive owner for vertical AI portfolio strategy with explicit authority across line-of-business silos. The decisions are too consequential to be made silo by silo. The horizontal-tool decisions can stay with the CIO. The vertical-agent decisions need a portfolio view.

    For each major vertical relevant to the business, assign a clear posture: acquire, partner, build, or wait. Defaulting to wait by not deciding is the same as deciding to wait — and in most verticals it is the wrong decision in 2026. Execution speed will separate leaders from followers in this cycle.

  • Q1 layoffs hit a four-year low. Tech’s share went up 40%.

    Q1 layoffs hit a four-year low. Tech’s share went up 40%.

    What was announced

    Challenger, Gray & Christmas reported in late March 2026 that U.S. employers announced 217,362 job cuts in Q1 — the lowest Q1 total since 2022. Within that aggregate, technology-sector cuts ran at 52,050, up 40% versus Q1 2025. In March specifically, AI was cited as the rationale for 15,341 cuts — 25% of the month’s total — making it the leading single reason for U.S. layoffs for the first time on the Challenger record. Major contributors to the technology figure: Dell’s annual filing-disclosed restructuring, Oracle’s March layoffs, and Meta’s Reality Labs reduction.

    What it means

    The aggregate-down, tech-up, AI-leading combination is not three separate stories. It is one story told from three angles. The aggregate number is down because the broad U.S. economy is operating with reasonable employment; sector-by-sector cuts in legacy industries are running below historical norms. The technology number is up because the sector is going through a structural reallocation — capital is shifting from headcount-led growth to compute-led growth, and the cost base of large software companies is being explicitly redesigned around that shift. AI is the leading cited reason because it is the strategic narrative that justifies the redesign to investors, customers, and remaining employees.

    The implication for the rest of 2026: technology-sector hiring patterns will continue to diverge from the broader economy. Companies will hire aggressively for ML, infrastructure, agent operations, and applied research while shrinking headcount in functions that AI is augmenting or displacing. Net headcount may decline, but the per-employee compute and capability budget rises sharply. That changes what “growth” looks like in the financial reporting of the sector.

    Andreas’s view

    My read on this: the Q1 numbers are not a downturn signal — they are a transformation signal masquerading as cost discipline. Tech companies are not in distress. They are restructuring around the assumption that a smaller, AI-augmented workforce produces equal or greater output at a different cost basis. Some of those bets will be right; some will be the Block experience at smaller scale, where the rehire follows the cut by six to twelve weeks. The Q2 and Q3 numbers will tell us how clean the underlying productivity gain actually is.

    I don’t think the AI-as-cited-reason metric stabilizes here. It rises through 2026. Once the framing carries an investor-relations multiple — which Block demonstrated — the disclosure pattern shifts in its direction across the sector. By year-end, AI-cited cuts will likely cross 30% of monthly U.S. totals, and that will look more like a permanent baseline than a peak.

    The way I see it: the Challenger headlines document neither a labor crisis nor a productivity victory. They are capturing a sector-wide capital reallocation with a coherent strategic logic and uneven execution quality. The more interesting question to me is which side of that reallocation any given business is on — and whether its cost base reflects the structure it has today or the structure it intends to have in 18 months.

    Three things I’m watching

    Three things I’m watching as this plays out:

    1. I’ll be watching whether companies are tracking the technology-sector comparison for their own organization: revenue, headcount, and per-employee compute spend versus the closest five public-market peers. That gap is where structural exposure shows up first.
    2. I’ll be watching whether organizations hold a meaningful distinction in their communications between AI-driven productivity reductions — workflow-modeled, with measurable output — and broader restructuring justified by other factors. The market may not differentiate; but the ones with rigorous operations will.
    3. I’ll be watching Q3 unit economics against any Q1 workforce action. The reduction is on the books in Q1; whether the underlying productivity thesis holds shows up in Q3 output measures, not headcount.

    References and related signals