OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

[KDD 2025] OneSearch-V2: Internalizing LLM Reasoning for High-Speed Generative Search

总结

问题

方法

结果

要点

摘要

OneSearch-V2 is a generative search framework for e-commerce that enhances item retrieval by integrating latent reasoning and self-distillation. It utilizes a thought-augmented query understanding module and a novel Token-Position Marginal Advantage (TPMA-GRPO) reinforcement learning mechanism to achieve SOTA performance (+3.98% CTR, +3.45% GMV) on the Kuaishou Mall platform.

Executive Summary

TL;DR: OneSearch-V2 marks a significant evolution in Generative Retrieval (GR) by proving that large language model (LLM) reasoning capabilities can be distilled directly into a search model's parameters. By using a "keyword-based Chain-of-Thought (CoT)" and an asymmetric self-distillation pipeline, the system captures deep user intent and complex query semantics without adding a single millisecond to online inference latency.

Background: Deployed at scale within Kuaishou Technology, OneSearch-V2 is a state-of-the-art industrial implementation that replaces traditional multi-stage retrieval/ranking pipelines with a unified generative framework, pushing the boundaries of how e-commerce platforms handle long-tail and ambiguous queries.

Problem & Motivation: The "Shallow Matching" Trap

Current Generative Retrieval models represent items as Semantic IDs (SIDs). While efficient, they often overfit to historical user logs. This leads to two critical failures:

Semantic Blindness: A query like "relieve fatigue, no supplements" might incorrectly retrieve vitamins because of high historical co-occurrence, failing to "reason" through the negation.
The Inference Bottleneck: While LLMs can solve these via Chain-of-Thought (CoT), generating 100+ tokens of "thought" for every search query is too slow for real-time systems.

OneSearch-V2 asks: Can we give a model the "intuition" provided by reasoning without making it "speak" its thoughts?

Methodology: The Core Innovations

1. Keyword-based CoT & Intent Calibration

Rather than verbose sentences, the authors extract Keyword-based CoTs (Intent, Category, Attribute, Topic). This condenses the "logic" of a search into a high-density semantic anchor.

2. Reasoning-Internalized Self-Distillation

This is the "magic" of OneSearch-V2.

Teacher: Receives the query + the Keyword-based CoTs.
Student: Receives only the query.
Alignment: The student is trained to match the teacher's output logit distribution. Since they share the same weights (Self-Distillation), the model learns to "internalize" the information provided by the keywords.

Overall Architecture

3. TPMA-GRPO: Hierarchical Credit Assignment

Standard Reinforcement Learning (RL) treats all tokens in a sequence equally. However, SIDs are hierarchical (Coarse Category $ o$ Fine Attribute). OneSearch-V2 introduces Token-Position Marginal Advantage (TPMA):

Prefix Gating: If the first token (category) is wrong, the gradient for subsequent tokens is zeroed out.
Positional Weighting: Early tokens (structural features) are given higher optimization priority.

Experiments & Results

The impact was immediate across Kuaishou's massive user base:

Conversion: Buyer conversion rate climbed by 3.05%, and GMV increased by 3.45%.
Robustness: Long-tail queries (the hardest to solve) saw the highest CTR gains (+5.37%), proving the reasoning module's effectiveness.

Experimental Results Comparison

The ablation studies confirmed that Self-Distillation (S)—the distilled model running without keywords—actually outperformed the Teacher (T) model running with keywords in some scenarios, suggesting that the distillation process regularizes and strengthens the model's internal latent space.

Critical Analysis & Conclusion

Takeaway: OneSearch-V2 successfully bridges the gap between the "system 2" (slow, deliberate) reasoning of LLMs and the "system 1" (fast, intuitive) requirements of industrial search.

Limitations: Despite its success, the system still relies on a three-stage SFT/RL training pipeline which can be complex to maintain. Future work in "Agentic Search" might seek to simplify this into a more continuous online learning loop.

Future Outlook: The concept of "Internalizing Reasoning" is a powerful paradigm shift. We should expect similar techniques to move into Computer Vision and Robotics, where real-time "intuition" is required for complex decision-making.

发现相似论文

试试这些示例

Search for recent papers on "internalizing reasoning" or "latent chain-of-thought" that avoid explicit token generation during inference.
Which studies first proposed Group Relative Policy Optimization (GRPO) and how have variants evolved to handle hierarchical or structured output spaces like Semantic IDs?
Investigate the latest advancements in "asymmetric self-distillation" for information retrieval tasks where the teacher has access to privileged metadata.

[KDD 2025] OneSearch-V2: Internalizing LLM Reasoning for High-Speed Generative Search

1. Executive Summary

2. Problem & Motivation: The "Shallow Matching" Trap

3. Methodology: The Core Innovations

3.1. 1. Keyword-based CoT & Intent Calibration

3.2. 2. Reasoning-Internalized Self-Distillation

3.3. 3. TPMA-GRPO: Hierarchical Credit Assignment

4. Experiments & Results

5. Critical Analysis & Conclusion