This was certainly the case with Claude Opus 4.5, the latest version of Anthropic’s most powerful model, released at the end of November. In December, METR reported that Opus 4.5 appeared to be able to autonomously complete a task that would have taken a human about five hours – a vast improvement over what even the exponential trend would have predicted. An Anthropic security researcher tweeted that he would change the direction of his research in light of these findings; another employee of the company simply wrote: “Mom, come get me, I’m scared.”
But the truth is more complex than these dramatic responses suggest. On the one hand, METR estimates of specific model capabilities have substantial error bars. As METR explicitly stated on Given the uncertainties inherent in the method, it was impossible to be sure.
“There are many reasons why people overinterpret the chart,” says Sydney Von Arx, a member of the METR technical team.
More fundamentally, the METR chart does not measure AI capabilities as a whole, nor does it claim to do so. In order to build the chart, METR tests the models primarily on coding tasks, rating the difficulty of each by measuring or estimating how long it takes humans to complete it, a metric that not everyone accepts. Claude Opus 4.5 might be able to complete some tasks that take humans five hours to complete, but that doesn’t mean it’s close to replacing a human worker.
METR was founded to assess the risks posed by border AI systems. Although he is best known for his exponential trend plotting, he has also worked with AI companies to evaluate their systems in more detail and has published several other independent research projects, including a widely covered July 2025 study which suggests that AI coding assistants might actually slow down software engineers.
But the exponential plot made METR’s reputation, and the organization seems to have a complicated relationship with the graph’s often breathless reception. In January, Thomas Kwa, one of the main authors of the article that introduced him, wrote a blog post responding to certain criticisms and explaining its limitations, METR is currently working on a more complete FAQ document. But Kwa is not optimistic that these efforts will significantly change the narrative. “I think whatever we do, the hype machine will just eliminate all the caveats,” he says.
Nonetheless, the METR team believes the plot has something significant to say about the trajectory of AI progress. “You absolutely should not tie your life to this chart,” says Von Arx. “But also,” she adds, “I bet this trend will continue.”
