WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
[CVPR 2024] AgentIR: Why Your AI Agent Needs a "Reasoning-Aware" Retriever
总结
问题
方法
结果
要点
摘要

The paper introduces AgentIR, a retrieval framework specifically designed for Deep Research agents. It features Reasoning-Aware Retrieval which embeds the agent's internal reasoning traces alongside queries, and DR-Synth, a synthesis method for creating multi-turn training data. AgentIR-4B achieves a state-of-the-art 68% accuracy on BrowseComp-Plus, significantly outperforming conventional retrievers.

TL;DR

Deep Research agents are not like human users—they "think" before they search. AgentIR is a novel retrieval paradigm that stops ignoring these thoughts. By embedding an agent's internal reasoning traces alongside its search queries, AgentIR-4B achieves a massive 18% absolute accuracy boost on complex multi-hop tasks, while actually reducing the number of total search steps required.

The Problem: The "Silent" Agent

When a human searches for "backroom studio early 2010s euphoric," a retriever sees an ambiguous string of words. However, a Deep Research agent issues that query because it has already spent 7 turns narrowing down a specific Grammy-winning composer.

Current retrieval systems treat agents like humans: they only look at the final query string. This ignores a goldmine of context. Existing methods like HyDE try to fix this by hallucinating context, but these "external" guesses often lead to even more confusion (e.g., misinterpreting a specific backroom as a studio named "Backroom Studio").

Methodology: Reasoning as a First-Class Citizen

AgentIR introduces two core innovations to bridge this gap:

1. Reasoning-Aware Retrieval

Instead of embedding just , AgentIR embeds the pair . This allows the retriever to understand:

  • Intent: What the agent is actually looking for.
  • Prior Success: What was already found in previous turns.
  • Hypotheses: The agent's parametric knowledge about likely targets.

2. DR-Synth: Manufacturing Data for Agents

We lack datasets where agents "solve" multi-turn problems. The authors created DR-Synth, which takes standard QA data and runs agent rollouts. They use an LLM "Oracle" to rerank results, creating gold-standard labels for these intermediate sub-queries.

Model Architecture Figure 1: Contrast between conventional retrieval and the AgentIR approach.

Why It Works: "Forgetting" is a Feature

The most profound insight in this paper is about history curation. The authors tested whether embedding the entire conversation history helped. Surprisingly, it didn't.

The latest reasoning trace is effectively a "compressed summary" of the history. It keeps what is relevant and filters out incorrect hypotheses or failed paths from earlier turns. By only embedding the current thought, the retriever avoids the noise of the agent's past mistakes.

Performance Comparison Table 1: AgentIR-4B dominates even much larger models across different agent backbones (Tongyi, OSS-120B, GLM).

Experimental Breakthroughs

  • SOTA Performance: AgentIR-4B hit 68% accuracy on BrowseComp-Plus. For context, BM25 (the old standard) hits only 37%.
  • Efficiency: The agent identifies the correct answer in fewer turns because the search results are significantly more precise (Recall jumped from 59% to 78%).
  • Zero-Shot Generalization: Even though and AgentIR was trained using one specific agent (Tongyi-DR), it worked perfectly for other models like GLM-4.7 without any extra tuning.

Conclusion & Future Outlook

AgentIR marks a shift in how we think about Information Retrieval. As agents become the primary "users" of the internet, retrievers must be designed to speak the language of "thoughts," not just keywords.

The concept of Context Engineering—strategically choosing what part of an agent's history to show the retriever—is the next frontier. AgentIR proves that the agent's own reasoning is the most efficient filter we have.


For more details, check out the AgentIR Project Page.

发现相似论文

试试这些示例

  • Search for recent papers published after 2024 that focus on "reasoning-aware" or "agent-centric" information retrieval systems for autonomous LLM agents.
  • Which earlier research established the concept of "query expansion" using LLM parametric knowledge (like HyDE), and how does AgentIR's use of internal reasoning traces differ from these external expansion methods?
  • Investigate how "multi-turn retrieval" and "context engineering" are being applied in multimodal agents or robotic task planning beyond text-based Deep Research.
目录
[CVPR 2024] AgentIR: Why Your AI Agent Needs a "Reasoning-Aware" Retriever
1. TL;DR
2. The Problem: The "Silent" Agent
3. Methodology: Reasoning as a First-Class Citizen
3.1. 1. Reasoning-Aware Retrieval
3.2. 2. DR-Synth: Manufacturing Data for Agents
4. Why It Works: "Forgetting" is a Feature
5. Experimental Breakthroughs
6. Conclusion & Future Outlook