Tencent Youtu Lab Researchers announcement a breakthrough on October 9, 2025 that will make training AI models significantly cheaper. The technique, called GRPO without trainingimproves the operation of AI agents without the expensive computational processing required by traditional methods.
Most AI companies spend thousands of dollars updating their models to perform better on specific tasks. This process, called fine-tuning, modifies the internal parameters of the model using expensive computing power. The new approach completely ignores these updates.
Instead of modifying the AI model itself, researchers teach it by giving it written instructions based on experience. Think of it like giving someone a cheat sheet instead of sending them back to school.
The technique works by running the AI on problems multiple times and then comparing which approaches worked best. The AI learns from these comparisons and creates a set of guidelines for handling similar problems in the future.
Machine learning automation continues to transform the way businesses manage their advertising campaignswith platforms balancing efficiency and advertiser control.
Tests showed impressive results. On difficult math problems called AIME benchmarks, the technique achieved 82.7% accuracy over 2,024 tests and 73.3% accuracy over 2,025 tests when combined with a code interpreter. This exceeded the benchmark performance by 2.7% and 5.4% respectively.
The cost difference is striking. The researchers spent about $18 and used just 100 training examples. Traditional methods like ReTool require thousands of examples and cost over $10,000 to train smaller AI models.
Web search tasks showed similar improvements. On the WebWalkerQA benchmark, the method achieved an accuracy of 67.8%, an improvement of 4.6% over standard approaches. The training consumed 38 million input tokens and 6.6 million output tokens over three training stages completed in six hours.
AI-based optimization tools have become increasingly important as marketing teams face pressure to deliver results with fewer resources.
A key advantage emerges when switching between different task types. Fine-tuned traditional models perform poorly when moved to different domains. A math-trained model goes from 67% accuracy on math problems to just 18% on web searches. Another template optimized for web search also struggles with math.
The new method maintains strong performance in both areas by simply swapping the instruction sets. This flexibility is important because businesses typically need AI that can handle multiple tasks rather than separate specialized models for each task.
The researchers tested their approach on DeepSeek-V3.1-Terminus, a large AI model with 671 billion parameters. They also tried smaller models like the Qwen3-32B and Qwen2.5-72B-Instruct, seeing consistent improvements across different sizes.
During testing, the AI learned to use the tools more efficiently. The average number of tool calls decreased both during training and when solving new problems. This suggests that the method teaches AI not only what to do, but also how to work smarter.
The technique works even without knowing the correct answers during training. Tests without ground truth still reached 80.7% on AIME24 and 68.9% on AIME25. AI relies on comparing multiple attempts and identifying patterns to find better approaches.
Agentic AI systems are reshaping the way organizations approach marketing automationMcKinsey identifying these technologies as pioneering developments that are driving industry transformation.
Smaller tests confirmed that it is important to compare multiple attempts. When the researchers set the size of the comparison group to just one attempt instead of five, performance dropped significantly. This confirms that AI benefits from different approaches to the same problem.
Real-world deployment costs favor this approach for many businesses. Traditional trained models require dedicated servers running continuously. A typical configuration costs $0.005 per issue but assumes constant server availability, making it ineffective for sporadic use.
The new method works on a pay-as-you-go basis via API calls. Each issue costs around $0.02 but requires no dedicated infrastructure. For businesses with irregular or unpredictable usage patterns, this is more cost-effective than keeping servers idle between requests.
Marketing platforms are increasingly integrating AI capabilities to help teams interact with data through natural language rather than technical queries.
The research addresses a persistent problem in AI development. Most companies can only afford to fine-tune smaller models because of the computational costs. These smaller models perform worse than simply using larger general models via API calls.
This creates a tricky situation in which the affordable option performs poorly in specialized tasks, while the better performing option lacks specialized knowledge. The new method solves this dilemma by improving large general-purpose models with inexpensive instruction sets.
The technique learned 48 different experiments during mathematics training. These ranged from validating that solutions fall within appropriate boundaries to understanding when compass directions like “southwest” and “southeast” form right angles.
For web search, AI has learned to prioritize official sources over third-party summaries, refine search terms based on formal titles, and verify numerical claims according to their authoritative origins. These learned behaviors directly address common failures of basic approaches.
The marketing community should pay attention to this, as this technology makes advanced AI capabilities accessible to small budgets. Previously, only large companies could afford specialized training on AI models. Now businesses can achieve similar results for less than the cost of a dinner for two.
Technical guides for creating AI marketing agents emphasize practical implementation steps that avoid common development pitfalls encountered by organizations.
This approach also solves the problem of needing multiple specialized models. A business may need AI for customer service, content creation, data analysis, and ad optimization. Training separate models for each task multiplies cost and complexity. Using a flexible model with different instruction sets is simpler and cheaper.
In the future, this technology could democratize AI capabilities throughout the marketing industry. Small agencies and in-house teams have access to sophisticated AI tools without a corporate budget. The barrier to entry goes from tens of thousands of dollars to hundreds.
Chronology
Summary
WHO: Researchers from Tencent Youtu Lab, including Yuzheng Cai, Siqi Cai, Yuchen Shi and Zihan Xu, developed the technique.
What: Training-free GRPO teaches AI models through written, experience-based instructions instead of costly fine-tuning, enabling better performance at significantly lower costs.
When: Published on October 9, 2025, with experiments using 100 training samples completed in six hours at a cost of around $18, compared to traditional methods costing over $10,000.
Or: Tested on AIME Mathematical Reasoning and WebWalkerQA Web Search Tests using DeepSeek-V3.1-Terminus and other major language models.
For what: Makes advanced AI capabilities accessible to small budgets by eliminating costly compute requirements while maintaining performance across different types of tasks without the need for multiple specialized models.