AI continues to expand its capabilities, achieving higher scores on benchmark tests across all types. But not all abilities are distributed equally. Frontier models now meet or exceed human capabilities on topics such as doctoral-level science questions, multimodal reasoning, and competitive mathematics. Other areas that had been underperforming saw tremendous growth. For example, according to Terminal-Bench, the success rate of agents handling real-world tasks increased from 20% in 2025 to 77.3% today, while AI agents handling cybersecurity issues resolved issues 93% of the time, up from 15% in 2024.
In other tasks AI lags behind, including learning from video, generating consistent and realistic videos, telling time, handling multi-step scheduling, performing financial analyses, and answering some expert-level academic exams. Robots still have a long way to go when it comes to handling household chores: They only manage 12% of actual household tasks, like folding clothes or washing dishes.
