Technology and Innovation insights

Technology and Innovation insights

ROI Per Token Has a Country

How the discipline of ROI-per-token is quietly handing the world's high-volume AI to Chinese models

Jun 21, 2026
∙ Paid

Estimated reading time: 9 minutes

Uber budgeted twelve months of AI. It got four. The company had handed coding tools to its engineers the previous fall, usage detonated — most engineers were reaching for the agents daily, and a rising fraction of the code shipping to production was coming out of them — and by April the annual budget was gone. So Uber imposed a cap of $1,500 per engineer per month per agentic coding tool, Business Insider reported. By late May, Amazon had pulled the plug on an internal leaderboard called KiroRank, which ranked employees by token consumption, after discovering it was nudging people to burn tokens on useless tasks just to climb the rankings; Meta killed an equivalent scoreboard nicknamed “Claudeonomics,” as Mint reported. The inference bill arrived, and what it exposed was not a pricing problem. It was a yardstick problem.

For two years, companies measured AI adoption by the metric easiest to collect and easiest to inflate: the volume of tokens consumed. More tokens, more “adopted.” The name that culture earned — tokenmaxxing — describes the incentive with precision: maximize consumption as proof of modernity. The correction now underway in 2026 swaps that yardstick for another, and it is that swap — not geopolitics, not model quality — that is the central argument of this essay. The economics of the token are migrating from volume to ROI-per-token — how much of every dollar spent on inference becomes delivered product — and this new discipline of corporate spending is the vector transferring global share to China’s low-cost models in the high-volume workloads. The thesis holds as long as the ROI yardstick remains the dominant purchasing criterion and the Chinese cost advantage stays structural; it weakens if the American labs close the inference-cost gap. That tokenmaxxing is dead is already this week’s consensus; what almost no one is naming is where the volume freed by its death is going — and the answer is an inference stack that isn’t American.

The number anchoring the turn is uncomfortable. A survey by Entelligence of 2,444 companies, released in May 2026, found that only 18% of spending on coding tokens converts into product that actually reaches production; the other 82% is absorbed by bug fixing, rewriting AI-generated code, and review delays, according to Entelligence. Read in isolation, it is a datapoint about operational efficiency. Read as a market signal, it is the destruction of a metric. If 82 cents of every dollar vanish before becoming product, then the number of tokens consumed was never an indicator of value — it was an indicator of activity. And a yardstick that measures activity, in a world where activity is cheap to generate and expensive to clean up, measures the wrong thing.

The death of tokenmaxxing has a recognizable anatomy, and it is Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure. The internal leaderboards — KiroRank at Amazon, Claudeonomics at Meta — turned consumption into a competition, and the competition produced exactly what you ask of it: more consumption, not more product. The loop closes when the bill reaches the CFO. From there, the token stops being a trophy of adoption and becomes what it always was financially: a line of operating expense, with an owner, a ceiling, and — now — a question attached. The question is no longer “how much did we use,” it is “how much of that became product.” Amazon, in killing KiroRank, began measuring what it called “normalized deployments” — AI-assisted code that is actually integrated, Mint reported. The yardstick moved from input to output. And no activity metric that becomes a target is immune to Goodhart — including the new one: if “18% conversion” becomes the next scoreboard to be maximized, it will be gamed like the token consumption it replaced.

The second-order effect of this yardstick change is what matters for the market, and it is where most analyses stop. Once the corporate buyer starts evaluating cost per unit of product delivered rather than cost per token consumed, the decision of which model to use stops being binary. The most expensive frontier model only justifies itself on the tasks where frontier capability is genuinely required — project planning, architecture, complex debugging. The bulk of the volume — repetitive code generation, document processing, standardized support — migrates to the cheapest model that solves the task. The ROI yardstick does not favor the best model; it favors the right model for each tier of task. And it is precisely in the high-volume, low-frontier-requirement tier that inference economics decides the purchase.

User's avatar

Continue reading this post for free, courtesy of Massimo.

Or purchase a paid subscription.
© 2026 Massimo · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture