WisPaper
WisPaper
Scholar Search
Scholar QA
Pricing
TrueCite
[Industrial SOTA] OneRanker: Bridging the Gap Between Generative Retrieval and Business Value
Summary
Problem
Method
Results
Takeaways
Abstract

OneRanker is an end-to-end generative advertising recommendation framework that unifies candidate generation and ranking into a single model. By integrating value-aware multi-task decoupling and a coarse-to-fine target awareness mechanism, it achieves SOTA performance on industrial datasets, notably improving GMV-Normal by +1.34% in Tencent’s Weixin Channels.

Executive Summary

TL;DR: OneRanker is a breakthrough from Tencent's ad-tech team that moves beyond the traditional "generate-then-rank" separation. It introduces a unified model that uses Task Tokens and Fake Item Tokens to make the generation process "aware" of ranking objectives and business value. This architecture solved the long-standing optimization tension between user interest and platform profit, resulting in a 1.34% GMV boost in production.

Positioning: This work is an evolution of the "One-Model" paradigm (following GPR and OneRec), focusing specifically on solving the "Target-Agnostic" and "Objective Misalignment" flaws in earlier generative recommenders.

The "Generative" Bottleneck: Why Simple Generation Fails in Ads

In the world of industrial advertising, the objective isn't just to find what a user likes (Interest), but what provides the most value (eCPM/GMV). Existing generative models have two fatal flaws:

  1. Target-Agnostic Generation: When the model generates candidate IDs, the user embedding is often static. It doesn't "look" at specific candidates until the ranking stage, leading to high-value items being filtered out too early.
  2. Optimization Tension: Trying to optimize for Clicks (coverage) and GMV (value) in one shared space usually leads to a "tug-of-war" where neither objective is fully reached.

Methodology: The OneRanker Architecture

OneRanker decomposes the recommendation process into three collaborative steps within a single transformer-based framework.

Model Architecture

1. Value-Aware Multi-Task Decoupling

Instead of a single output head, OneRanker uses a sequence of Task Tokens (). These tokens act as specialized "queries." By using a Causal Mask, the model allows value-aware tasks to learn from interest-based tasks (e.g., Impression → Click → Conversion → Value), creating a progressive refinement of user intent.

2. Coarse-to-Fine Target Awareness

How do you make a generator aware of what it hasn't generated yet?

  • Coarse-Grained: OneRanker introduces Fake Item Tokens—cluster centers of the entire item space. By attending to these centers during generation, the model implicitly senses the "neighborhood" of potential candidates.
  • Fine-Grained: The Ranking Decoder in Step 3 uses cross-attention between candidates and task tokens, ensuring the final score is explicitly aligned with the item's specific features.

3. Dual-Side Consistency

To prevent "semantic drift" between the generation and ranking stages, OneRanker employs:

  • Input Side: The ranker reuses Key/Value states from previous steps (Pass-through).
  • Output Side: A Distributional Consistency (DC) Loss is used. This treats the ranker as a "teacher," forcing the generator to "anticipate" the ranker’s preferences during the retrieval phase.

Experimental Validation

The performance gains reported are substantial for an industrial setting.

Performance Comparison

As shown in the table above, OneRanker improved HR@1 by 44.7% compared to GPR. This suggests that "Target Awareness" is not just a marginal improvement but a fundamental necessity for generative models to match the precision of traditional discriminative rankers.

Ablation Insight: The Power of Fake Items

Removing the Fake-Item-Token mechanism (Target) led to a 4.5% drop in HR@5. This confirms that even "coarse" awareness of the item space significantly improves the quality of the generated multi-interest paths.

Critical Analysis & Conclusion

OneRanker successfully addresses the "Blind Generation" problem. By forcing the model to consider the item distribution and business value during the retrieval stage, it eliminates the irreversible information loss characteristic of the old-school cascaded pipeline.

Limitations:

  • The reliance on K-means cluster centers (Fake Items) assumes a relatively stable item semantic space. In environments with extreme "cold-start" item velocity, these clusters might need frequent updates.
  • The computational overhead of multi-stage decoders in a single pass requires a highly optimized inference engine like HSTU.

Future Outlook: The success of the Distributional Consistency Loss points toward a future where "Retrieval" and "Ranking" are no longer two different tasks but merely two different resolutions of the same generative process.

Find Similar Papers

Try Our Examples

  • Find recent papers on generative recommendation systems that utilize reinforcement learning or DPO to align model outputs with multi-objective business values.
  • What are the primary methods used for "target-aware" generative retrieval to overcome the bottleneck of static user embeddings in Transformer-based architectures?
  • Explore how K-means clustering of item embeddings (Fake Item Tokens) compares to RQ-VAE or other semantic ID quantization methods for representing item semantic spaces.
Contents
[Industrial SOTA] OneRanker: Bridging the Gap Between Generative Retrieval and Business Value
1. Executive Summary
2. The "Generative" Bottleneck: Why Simple Generation Fails in Ads
3. Methodology: The OneRanker Architecture
3.1. 1. Value-Aware Multi-Task Decoupling
3.2. 2. Coarse-to-Fine Target Awareness
3.3. 3. Dual-Side Consistency
4. Experimental Validation
4.1. Ablation Insight: The Power of Fake Items
5. Critical Analysis & Conclusion