OneRanker is an end-to-end generative advertising recommendation framework that unifies candidate generation and ranking into a single model. By integrating value-aware multi-task decoupling and a coarse-to-fine target awareness mechanism, it achieves SOTA performance on industrial datasets, notably improving GMV-Normal by +1.34% in Tencent’s Weixin Channels.
Executive Summary
TL;DR: OneRanker is a breakthrough from Tencent's ad-tech team that moves beyond the traditional "generate-then-rank" separation. It introduces a unified model that uses Task Tokens and Fake Item Tokens to make the generation process "aware" of ranking objectives and business value. This architecture solved the long-standing optimization tension between user interest and platform profit, resulting in a 1.34% GMV boost in production.
Positioning: This work is an evolution of the "One-Model" paradigm (following GPR and OneRec), focusing specifically on solving the "Target-Agnostic" and "Objective Misalignment" flaws in earlier generative recommenders.
The "Generative" Bottleneck: Why Simple Generation Fails in Ads
In the world of industrial advertising, the objective isn't just to find what a user likes (Interest), but what provides the most value (eCPM/GMV). Existing generative models have two fatal flaws:
- Target-Agnostic Generation: When the model generates candidate IDs, the user embedding is often static. It doesn't "look" at specific candidates until the ranking stage, leading to high-value items being filtered out too early.
- Optimization Tension: Trying to optimize for Clicks (coverage) and GMV (value) in one shared space usually leads to a "tug-of-war" where neither objective is fully reached.
Methodology: The OneRanker Architecture
OneRanker decomposes the recommendation process into three collaborative steps within a single transformer-based framework.

1. Value-Aware Multi-Task Decoupling
Instead of a single output head, OneRanker uses a sequence of Task Tokens (). These tokens act as specialized "queries." By using a Causal Mask, the model allows value-aware tasks to learn from interest-based tasks (e.g., Impression → Click → Conversion → Value), creating a progressive refinement of user intent.
2. Coarse-to-Fine Target Awareness
How do you make a generator aware of what it hasn't generated yet?
- Coarse-Grained: OneRanker introduces Fake Item Tokens—cluster centers of the entire item space. By attending to these centers during generation, the model implicitly senses the "neighborhood" of potential candidates.
- Fine-Grained: The Ranking Decoder in Step 3 uses cross-attention between candidates and task tokens, ensuring the final score is explicitly aligned with the item's specific features.
3. Dual-Side Consistency
To prevent "semantic drift" between the generation and ranking stages, OneRanker employs:
- Input Side: The ranker reuses Key/Value states from previous steps (Pass-through).
- Output Side: A Distributional Consistency (DC) Loss is used. This treats the ranker as a "teacher," forcing the generator to "anticipate" the ranker’s preferences during the retrieval phase.
Experimental Validation
The performance gains reported are substantial for an industrial setting.

As shown in the table above, OneRanker improved HR@1 by 44.7% compared to GPR. This suggests that "Target Awareness" is not just a marginal improvement but a fundamental necessity for generative models to match the precision of traditional discriminative rankers.
Ablation Insight: The Power of Fake Items
Removing the Fake-Item-Token mechanism (Target) led to a 4.5% drop in HR@5. This confirms that even "coarse" awareness of the item space significantly improves the quality of the generated multi-interest paths.
Critical Analysis & Conclusion
OneRanker successfully addresses the "Blind Generation" problem. By forcing the model to consider the item distribution and business value during the retrieval stage, it eliminates the irreversible information loss characteristic of the old-school cascaded pipeline.
Limitations:
- The reliance on K-means cluster centers (Fake Items) assumes a relatively stable item semantic space. In environments with extreme "cold-start" item velocity, these clusters might need frequent updates.
- The computational overhead of multi-stage decoders in a single pass requires a highly optimized inference engine like HSTU.
Future Outlook: The success of the Distributional Consistency Loss points toward a future where "Retrieval" and "Ranking" are no longer two different tasks but merely two different resolutions of the same generative process.
