ZeroSearch: Alibaba's revolution for training AI efficiently and autonomously

Last update: May 12th 2025
  • ZeroSearch dramatically reduces the cost of training AI models through simulated searches, eliminating the reliance on external search engines.
  • It uses a supervised reinforcement learning system that improves the recall and reasoning capabilities of LLMs.
  • It allows companies and developers to train advanced models at low cost, gaining autonomy and control over the process.

What is ZeroSearch, artificial intelligence

Innovation in the field of artificial intelligence has exploded in recent years, especially in relation to large language models (LLMs). One of the most significant breakthroughs of 2025 has been ZeroSearch, a technology developed by Alibaba that is shaking the foundations of how these models are trained. What exactly is ZeroSearch about, and why is it generating so much buzz in the industry? In this article, we take a detailed look at this new methodology, including how it works, what advantages it offers over traditional methods, and how it can change the development of AI at all levels.

In tech circles, the talk is all about it: ZeroSearch promises to reduce the training costs of artificial intelligence models by no less than 88%.This leap in efficiency, far from being a mere marketing gimmick, has profound implications for businesses large and small, developers, and, of course, for the advancement of general artificial intelligence.

What is ZeroSearch and where does it come from?

ZeroSearch is a new reinforcement learning-based technique designed to train language models without relying on real external search engines during the training process. This innovation comes from Alibaba's Tongyi laboratory, with the aim of solving two common problems in training AI models that use web searches: the high economic cost for the use of APIs and the unpredictability in the quality of the recovered documents.

Until now, developing advanced assistants, chatbots, or recommendation engines required sending tens of thousands of queries to search engines like Google through paid services, increasing the cost and limiting scalability, especially for companies with tight budgets.

ZeroSearch changes the rules of the game by betting on a system in which The LLM itself learns to simulate the operation of a search engine, generating relevant or even noisy (irrelevant) documents in response to queries and thus allowing training without external interaction.

How ZeroSearch Works in AI

How does ZeroSearch work? A detailed technical explanation

At the heart of ZeroSearch is a reinforcement learning (RL) framework that eliminates the need for actual web searches during training. Let's look at this process step by step, based on Alibaba's approach and the extensive published analyses of the technique.

  All about Generative Artificial Intelligence: how it works, uses, and risks

1. Lightweight supervised tuning to simulate searches

Everything starts from one supervised fine-tuning (SFT) In this phase, the LLM is trained to behave as an information retrieval module. Through this tuning, it learns to generate responsive documents to queries, mimicking the textual style and type of content that a real search engine would offer. During this initial phase, interaction trajectories between the model and a search engine are collected, establishing records of queries and documents retrieved.

Successful paths, that is, those that lead to the correct answer, are labeled as positive (useful documents), while those that result in errors or incorrect answers are marked as negative (noisy documents). This differentiation will later help the model understand and reproduce the dynamics of a realistic search, including relevant and less useful results.

2. Role of reinforcement learning with curricular simulation

After supervised tuning, the model moves into the reinforcement training phase, where good practices are reinforced and errors are penalized. Here, the simulated LLM itself acts as a search engine, responding to queries generated by the policy model and returning documents that may be useful or noisy.

The difficulty for the model increases progressively, following a curricular strategy that slowly degrades the quality of the documents generated, so that The system first learns in controlled environments and, as it progresses, is confronted with increasingly noisy or complex examples.This approach helps the model develop robust search and reasoning capabilities under realistic conditions.

3. Design of rewards and evaluation metrics

To guide learning, ZeroSearch uses a function of reward based on the F1 score, which balances precision and recall by taking into account the word match between the prediction and the correct answer. The goal is to maximize the accuracy of the final answers the model is able to generate, without worrying excessively about formatting, since LLMs typically produce naturally well-formatted texts.

4. Multi-turn interaction and reasoning templates

During training, interaction templates are used that divide the process into three stages: internal reasoning (delimited between tags like <think>...</think>), conducting the consultation (<search>...</search>) and response generation (<answer>...</answer>). This way, the model can improve both its ability to formulate relevant queries and to provide well-founded answers.

5. Compatibility and scalability

ZeroSearch supports major language models, such as the Qwen-2.5 family, Qwen-2.5LLaMA-3.2, and its base and instruction-tuned variants. It can also be implemented with various reinforcement algorithms (PPO, GRPO, among others), facilitating its adoption in a variety of development environments.

  Gemini for Android: A revolution in artificial intelligence and mobile productivity

ZeroSearch Applications and Results

Real-world data: How much does ZeroSearch save and how does it perform?

Experiments conducted by Alibaba and reported in specialized publications and repositories show that ZeroSearch achieves performance comparable to, and even superior to, those obtained through real commercial search engines.The cost savings are particularly notable:

  • Performing 64.000 queries using the Google Search API can cost around $586,70 (approx. €540).
  • The same query volume, generated and managed with a 14.000 billion parameter LLM using ZeroSearch, reduces the cost to just $70,80 (about € 65).
  • This differential implies a 88% savings in training costs, eliminating dependency on external APIs and allowing for greater scalability.

On the other hand, the quality results are impressive: experiments show that a 7B-parameter retrieval module matches the performance of systems based on Google Search, while with 14B parameters the model even outperforms it in question-and-answer tasks, using both single-hop and complex inference datasets.

Key advantages and impact on the artificial intelligence industry

The arrival of ZeroSearch represents a radical shift in the way companies and developers can approach the training of advanced models.:

  • Drastic reduction of the economic barrier: Facilitates access to advanced AI techniques for SMEs, startups, and independent developers who were previously held back by the cost of commercial APIs.
  • Greater control over trainingBy generating simulated documents, teams can define exactly what information the model receives, adjusting the difficulty and quality to suit their needs.
  • Boosting technical autonomy: Minimizes dependence on large foreign technology platforms, promoting the local development of customized AI solutions.
  • Adaptability and modularityZeroSearch can be deployed on a variety of models and tailored to different workflows and business requirements.

Differences from previous strategies: RAG, real searches and simulations

Before ZeroSearch, the predominant solution for providing up-to-date and accurate information to LLMs was the use of Retrieval-Augmented Generation (RAG), where the model queries external sources using real-world searches. However, this presents some obvious problems:

  • High cost: Continued use of APIs can skyrocket budgets.
  • variable quality: Retrieved documents can be very inconsistent depending on the searches and the API itself.
  • Legal and privacy limitations: Relying on third-party services involves legal and political risks, especially if you are training with sensitive information.

ZeroSearch eliminates the need to continually refer to external sources, allowing the model to learn to search “within itself” as it simulates the experience of interacting with a search engine.

  Machine Learning: Basic and Advanced Concepts

Impact and real-life applications: from Quark to the democratization of AI

Alibaba has already integrated ZeroSearch into commercial products. Its Quark application, powered by Qwen models, has seen notable improvements in reasoning and accurate responses to complex queries thanks to this technique. But perhaps most significantly, ZeroSearch opens the door for smaller companies to design their own advanced models without the need for expensive external infrastructure..

manus ia-0
Related article:
Everything you need to know about Manus, the AI ​​agent who wants to do your job

The research community has access to the code repository, datasets, and pre-trained models on both GitHub and Hugging Face, which is fostering global adoption and experimentation.

What will the future of AI training look like thanks to ZeroSearch?

As these techniques mature, we will see the proliferation of intelligent assistants with advanced search capabilities without relying on Google, Bing, or the like. This opens up new opportunities in education, business, and research, while potentially eroding the dominance of major search engines in the artificial intelligence sector.

For Spain and Europe, this represents the possibility of autonomous growth, reduced technological dependence and costs, and greater strategic control over critical information systems.

The rise of ZeroSearch marks the beginning of a new era in which training AI models will cease to be a luxury available to a select few and become an accessible, scalable, and increasingly sophisticated tool. By teaching AI to search without leaving its own environment, Alibaba has taken a giant step towards developing self-sufficient, efficient systems that adapt to any need.It's no longer just about reducing costs, but about reinventing the rules of the game for the entire artificial intelligence industry.

What is e-commerce
Related article:
What is e-commerce: 10 Keys to Understanding Electronic Commerce