Generative Recommendation for Large-Scale Advertising

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

Generative Recommendation for Large-Scale Advertising

[Kuaishou 2026] GR4AD: Redefining Large-Scale Advertising via Generative Recommendation

总结

问题

方法

结果

要点

摘要

The paper presents GR4AD (Generative Recommendation ADvertising), a production-oriented generative recommendation system deployed at Kuaishou. It introduces a co-designed architecture featuring Unified Advertisement Semantic IDs (UA-SID) and a Lazy Autoregressive (LazyAR) decoder, achieving a 4.2% ad revenue increase and 100ms real-time serving latency for 400 million users.

TL;DR

Kuaishou has successfully transitioned its massive advertising stack from traditional DLRMs to a generative paradigm. GR4AD introduces a "recommendation-native" generative design that moves past simple "next-token prediction." By leveraging UA-SID for multi-modal ad representation, a LazyAR decoder for high-throughput inference, and RSPO for value-aware reinforcement learning, the system achieved a 4.2% revenue increase and 10.17% CVR boost while maintaining sub-100ms latency for 400 million users.

Problem & Motivation: The "LLM Gap" in Advertising

While Generative Recommendation (GenRec) is the new frontier, the industry has struggled to deploy it in real-time advertising. Standard LLM recipes fail here for three reasons:

Semantic Collision: Ads are more than just "text." Identical videos might target different conversion goals (e.g., "purchase" vs. "app install"), creating collisions in standard semantic ID spaces.
Point-wise vs. List-wise: LLMs learn to predict the next token, but ad platforms need to optimize the eCPM of a ranked list.
The Inference Tax: Autoregressive decoding is notoriously slow. Serving hundreds of candidates per request under a 100ms budget is a nightmare for standard Transformer decoders.

Methodology: The Core Innovations

1. UA-SID: Beyond Basic Embeddings

To solve the tokenization problem, GR4AD uses Unified Advertisement Semantic IDs. They fine-tune a Multi-modal LLM (MLLM) on ad-specific instructions and use Co-occurrence Learning to inject collaborative filtering signals into the IDs.

To reduce collisions, they use Multi-Granularity-Multi-Resolution (MGMR) RQ-Kmeans. This allows earlier levels of the ID to capture high-level semantics while final levels use hash-based numeric mapping for business-specific signals.

Unified Advertisement Semantic ID Architecture

2. LazyAR: Engineering the Latency Breakthrough

The "Aha!" moment in this paper is the LazyAR (Lazy Autoregressive) Decoder. The authors noticed that the first token of a Semantic ID carries the most weight, but later tokens are cheaper to compute if you relax the rules.

Instead of full autoregression, LazyAR computes the first $K$ layers in parallel (shared across all beams) and only does the final $L - K$ layers autoregressively. This effectively doubles the inference throughput (QPS) without hurting recommendation quality.

Comparison of Vanilla AR vs LazyAR

3. RSPO: Aligning with Business Value

Standard Cross-Entropy isn't enough. GR4AD introduces RSPO (Ranking-Guided Softmax Preference Optimization). It’s an RL-based approach that treats the candidate list as a whole, optimizing for an upper bound of NDCG, specifically weighted by business value (eCPM). This ensures the model isn't just "predicting what's next" but "predicting what's valuable."

Experiments & Results: Scaling Laws are Real

The results from Kuaishou’s production environment are definitive:

Revenue Growth: Up to +4.32% relative to the GR-Base.
Efficiency: LazyAR + Serving optimizations reached 117% QPS improvement over vanilla generative setups.
Scaling Laws: The authors observed a linear relationship between revenue and both model size and inference beam width.

Scaling Laws for GR4AD

Critical Insight & Conclusion

GR4AD proves that the future of recommendation is generative, but only if we stop treating recommenders as "text completion engines." The success here stems from architectural relaxation (LazyAR) and metric alignment (RSPO).

Takeaway: If you are scaling GenRec, don't just scale the parameters. Scale the inference-time search (beam width) and optimize the sharing of hidden states across beams.

Limitations

The LazyAR design is specific to short-sequence generation (Semantic IDs) and likely won't translate to long-form LLM text generation.
The reliance on a Reward Model for RL signal means the upper bound of performance is still capped by the quality of the teacher reward model.

发现相似论文

试试这些示例

Search for recent papers on generative recommendation that utilize list-wise reinforcement learning or preference optimization for ranking.
Which paper originally proposed the concept of Semantic IDs in generative retrieval, and how does GR4AD's quantization differ from TIGER or RQ-VAE?
Explore how the LazyAR architecture's late-injection approach could be applied to other real-time sequential tasks like multi-modal video captioning.

[Kuaishou 2026] GR4AD: Redefining Large-Scale Advertising via Generative Recommendation

1. TL;DR

2. Problem & Motivation: The "LLM Gap" in Advertising

3. Methodology: The Core Innovations

3.1. 1. UA-SID: Beyond Basic Embeddings

3.2. 2. LazyAR: Engineering the Latency Breakthrough

3.3. 3. RSPO: Aligning with Business Value

4. Experiments & Results: Scaling Laws are Real

5. Critical Insight & Conclusion

5.1. Limitations