ParamMem: Augmenting Language Agents with Parametric Reflective Memory

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

ParamMem: Augmenting Language Agents with Parametric Reflective Memory

[ICLR 2025] ParamMem: Breaking the Reflection Loop with Parametric Memory

总结

问题

方法

结果

要点

摘要

The paper introduces ParamMem, a parametric memory module for language agents that encodes cross-sample reflection patterns directly into model parameters. By integrating this module into a new framework called ParamAgent, the authors achieve state-of-the-art results in programming, math, and multi-hop QA by significantly increasing reflective diversity.

TL;DR

Self-reflection is the hallmark of modern language agents, yet it often fails when the agent gets stuck in a loop of repetitive, unhelpful critiques. ParamMem solves this by moving memory from external "banks" into model parameters. By fine-tuning a lightweight module to learn patterns of failure and correction, the proposed ParamAgent boosts coding and math performance by providing a more diverse set of "second opinions" during problem-solving.

The "Reflective Diversity" Gap

Research shows a 0.76 Pearson correlation between the diversity of an agent's reflections and its ultimate task success. Current agents are "narrow-minded"; when they fail a task, their self-critiques are often just rephrasings of the same wrong idea.

Previous SOTA methods like DoT-bank tried to fix this by retrieving similar past solved cases. However, retrieval depends on embedding similarity, which often "collapses" into low-quality matches. ParamMem shifts the paradigm from retrieving to generating diversity.

Methodology: The ParamMem Architecture

ParamMem isn't a giant new model; it's a lightweight plug-in (often an 8B model) that works alongside a larger "Actor" LLM.

Pattern Internalization: The module is fine-tuned on auxiliary data (buggy code/derivations paired with diverse reflections).
Diverse Sampling: During inference, ParamMem uses temperature-controlled sampling to generate multiple "global-level" diagnostic hints.
Hybrid Memory: ParamAgent-plus unifies three types of memory:
- Episodic Memory: What happened in the current trial?
- Cross-Sample Memory: What worked for similar problems?
- Parametric Memory: What are the general patterns of errors in this domain?

Framework Comparison Figure: Comparison of memory mechanisms across Reflexion, DoT-bank, and ParamAgent.

Why It Works: Expanding the Hypothesis Space

When an LLM fails a programming task, it usually doesn't know why. ParamMem acts like a senior architect who has seen thousands of bugs. By providing diverse reflections, it expands the "hypothesis space." If one reflection doesn't lead to a fix, the next diverse tip likely will.

Experimental Results Table: Dramatic performance gains across Llama, Mistral, and Qwen backbones.

Key Insights from Experiments

Weak-to-Strong Transfer: An 8B ParamMem module can improve a 70B or even 80B Actor. This suggests that "knowing how to reflect" is a skill that can be decoupled from "knowing how to solve."
Sample Efficiency: You don't need millions of examples. Just 500 diverse samples are enough to fine-tune a ParamMem module that outperforms traditional retrieval systems trained on 8000+ samples.
Self-Improvement: The module can be trained on data generated by the base model itself (Self-Teaching), slowly lifting its own performance ceiling without needing GPT-4 labels.

Conclusion: The Future of Agentic Memory

The success of ParamMem signals a shift toward Hybrid Memory Systems. The limitations of pure retrieval (RAG) for complex reasoning are becoming clear. By "baking" experiences into parameters, agents become more intuitive and less reliant on finding a "perfect match" in a database.

Takeaway: If you want your agent to stop repeating mistakes, don't just give it a library of past cases—give it a specialized module that understands the logic of failure.

发现相似论文

试试这些示例

Search for recent papers that compare parametric memory vs. retrieval-augmented generation (RAG) specifically for enhancing agentic reasoning diversity.
Who first proposed the Reflexion framework for LLMs, and how does ParamMem specifically resolve the "repetitive output" limitation identified in subsequent evaluations?
Explore research applying parametric reflective modules to Reinforcement Learning (RL) agents or multi-modal LLM reasoning tasks.

[ICLR 2025] ParamMem: Breaking the Reflection Loop with Parametric Memory

1. TL;DR

2. The "Reflective Diversity" Gap

3. Methodology: The ParamMem Architecture

4. Why It Works: Expanding the Hypothesis Space

5. Key Insights from Experiments

6. Conclusion: The Future of Agentic Memory