Google released Thursday a “reinvented” version of its research agent Gemini Deep Research based on its highly publicized ultra-modern foundation modelGemini 3 Pro.
This new agent isn’t just designed to produce search reports, although it can still do that. It now allows developers to integrate the search capabilities of Google’s SATA model into their own applications. This capability is made possible thanks to the new Interactions APIwhich is designed to give developers more control in the next era of agentic AI.
The new Gemini Deep Research tool is an agent equipped to synthesize mountains of information and handle a large context dump in the prompt. Google says it is used by customers for tasks ranging from due diligence to drug safety research.
Google also says it will soon integrate this new deep search agent into services including Google Search, Google Finance, its Gemini app, and its popular NotebookLM. This is another step toward preparing for a world in which humans no longer search for things on Google, but their AI agents do.
The tech giant says Deep Research benefits from Gemini 3 Pro’s status as the “most evidence-based” model, trained to minimize hallucinations during complex tasks.
AI hallucinations – where the LLM just makes things up – are a particularly crucial problem for long-running, deep-reasoning agent tasks, in which many autonomous decisions are made in minutes, hours, or more. The more choices an LLM must make, the more likely it is that a single crazy choice will invalidate the entire outcome.
To prove its progress, Google also created another benchmark (as if the AI world needed another one). The new benchmark is unimaginatively called DeepSearchQA and is intended to test agents on complex, multi-step information search tasks. Google has opened this benchmark.
Techcrunch event
San Francisco
|
October 13-15, 2026
He also tested Deep Research on Humanity’s Last Exam, an independent general knowledge repository with a much more interesting name and full of incredibly specialized tasks; and BrowserComp, a benchmark for browser-based agentic tasks.
As expected, Google’s new agent beat the competition on its own benchmark and that of Humanity. However, OpenAI’s ChatGPT 5 Pro was surprisingly second and slightly ahead of Google on BrowserComp.
But these benchmark comparisons were obsolete almost by the time Google published them. Because on the same day, OpenAI launched its highly anticipated GPT 5.2 – codenamed Garlic. OpenAI claims its new model outperforms competitors, particularly Google, on a series of typical benchmarks, including one developed in-house by OpenAI.
Perhaps one of the most interesting aspects of this announcement was the timing. Knowing that the world was waiting for Garlic’s release, Google released its own AI news.
