Two of the world’s biggest AI companies, Google and OpenAI, both warned this week that competitors, including China’s DeepSeek, are testing their models to steal the underlying reasoning and then copying those capabilities into their own AI systems.
“This is coming from threat actors all over the world,” said John Hultquist, chief analyst at the Google Threat Intelligence Group. said The registeradding that the perpetrators are “private sector companies”. He declined to name specific companies or countries involved in this type of intellectual property theft.
“Your model is really valuable intellectual property, and if you can distill the logic of it, there is very real potential to replicate this technology — which is not cheap,” Hultquist said. “This is a very important technology and the list of parties interested in replicating it is endless.”
Google calls this process of using prompts to clone its models “distillation attacks,” and in a Thursday report said one campaign used more than 100,000 prompts to “attempt to replicate Gemini’s reasoning ability in target languages other than English in a wide variety of tasks.”
American tech giants have spent billions of dollars training and developing their own LLMs. Abusing legitimate access to mature models like Gemini, and then using that information to train newer models, makes developing their own chatbots and AI systems much cheaper and easier for competitors.
Google claims to have detected this probe in real time and protected its internal reasoning traces. However, distillation appears to be yet another AI risk it is extremely difficult, if not impossible, to eliminate.
This is a very important technology and the list of parties interested in replicating it is endless.
Distilling from Gemini models without permission violates Google’s terms of service, and Google can block accounts that do this, or even sue users. Although the company says it continues to develop better ways to detect and stop these attempts, the very nature of LLMs makes them vulnerable.
Public-facing AI models are widely available, and cracking down on abusive accounts can turn into a game of whack-a-mole.
Additionally, as Hultquist warned, as other companies develop their own models and train them on sensitive internal data, the risk of distillation attacks will spread.
“We’re at the frontier on this, but as more and more organizations have models that they provide access to, it’s inevitable,” he said. “As this technology is adopted and developed by companies like financial institutions, their intellectual property could also be targeted in this way.”
Meanwhile, OpenAI, on a Thursday note (PDF) to the House Select Committee on China, accused DeepSeek and other Chinese LLM providers and universities of copying ChatGPT and other U.S. companies’ frontier models. He also noted some occasional activity by Russia and warned that illicit model distillation posed a risk to “US-led democratic AI.”
Chinese distillation methods have become more sophisticated over the past year, going beyond chain of thought (CoT) extraction multi-step operations. These include synthetic data generation, large-scale data cleaning, and other stealth methods. As OpenAI wrote:
OpenAI also notes that it has invested in stricter detections to prevent unauthorized distillation. It bans accounts that violate its terms of service and proactively removes users who appear to be trying to distill its patterns. Yet the company admits that it cannot solve the model distillation problem alone.
It will take an “ecosystem security” approach to protect against distillation, and that will require some help from the U.S. government, OpenAI says. “It is not enough for a laboratory to increase its protection, as adversaries will simply default to the least protected vendor,” according to the memo.
The AI company also suggests that US government policy “could be helpful” when it comes to sharing information and intelligence, and working with industry to develop best practices for distillation defense. OpenAI also called on Congress to close API router loopholes that allow DeepSeek and other competitors to access U.S. models, and to restrict “adversaries'” access to U.S. computing and cloud infrastructure. ®
