WisPaper
WisPaper
学术搜索
学术问答
论文订阅
价格
TrueCite
[CVPR 2025 candidate] OpenEarth-Agent: Moving from Tool Calling to Tool Creation for Open-Environment Earth Observation
总结
问题
方法
结果
要点
摘要

OpenEarth-Agent is a novel multi-agent framework designed for autonomous Earth Observation (EO) in open environments. It transitions from traditional static tool-calling to a dynamic "tool-creation" paradigm, achieving state-of-the-art performance on the new 596-case OpenEarth-Bench.

TL;DR

The field of Remote Sensing (RS) and Earth Observation (EO) is shifting away from simple image classification toward complex, autonomous scientific workflows. OpenEarth-Agent is the first framework that doesn't just "use" tools—it "creates" them. By synthesizing specialized code on the fly to handle heterogeneous data, it achieves a full-pipeline EO capability (acquisition → extraction → analysis) across seven domains, significantly outperforming static tool-calling agents.

Current Status: A breakthrough in "Open-World" autonomy for Geospatial AI.


1. The Bottleneck: The "Closed-World" Tool Trap

Most current RS agents (like Earth-Agent or ThinkGeo) operate on a Fixed Toolset Logic. They have a library of APIs (e.g., calculate_NDVI(), segment_building()). If the data has a weird offset, a missing band, or requires a novel sensor calibration that wasn't pre-programmed, the agent breaks.

Why this fails in the real world:

  • Data Diversity: Sensors like MODIS, Sentinel, and SAR have different scaling factors and invalid value masks.
  • Logical Rigidity: Predefined tools often apply "naive" processing (like min-max stretching) that destroys the physical meaning of multispectral data.
  • Narrow Scope: Most agents chỉ focus on one part of the pipeline (e.g., just object detection) rather than the whole scientific inquiry.

2. Methodology: The "Tool-Maker" Architecture

OpenEarth-Agent introduces a collaborative multi-agent system that mimics a human scientist's workflow.

A. Iterative Data Exploration

Instead of assuming the data format, the Data Summary Agent writes "probing scripts" to check metadata (projections, bands, resolutions) in real-time. If the script fails, it refines it. This ensures the agent "grounds" its logic in the actual file it is looking at.

B. Adaptive Tool Creation

Instead of searching for a tool, the Coding Agent writes one. For example, if a task requires a "Topographic Vegetation Diversity Index (TVDI)", the agent synthesizes the mathematical logic based on its integrated knowledge base and software libraries (gdal, rasterio).

Overall Architecture of OpenEarth-Agent

C. The Feedback Loop (The "Checking Agent")

Results are verified against geoscientific rules. If an NDVI calculated by the agent results in values outside the [-1, 1] range, the Checking Agent flags a logical error, triggering a refactoring of the tool.


3. OpenEarth-Bench: A New Standard

The authors argue that current benchmarks are too easy because they provide the tools to the agent. They proposed OpenEarth-Bench, featuring:

  • 596 real-world cases across Urban, Agriculture, Water, etc.
  • Full-Pipeline requirement: From GEE (Google Earth Engine) data acquisition to Spatio-temporal trend analysis.
  • Minimalist Tools: The agent is only given 6 essential foundation models; everything else must be created from scratch.

OpenEarth-Bench Overview


4. Key Results & Insights

Breaking the "Tool Calling" Ceiling

The most striking result is found in the cross-benchmark evaluation on Earth-Bench. OpenEarth-Agent, using its "creation" strategy and only 6 tools, outperformed specialized agents that had access to 104 predefined tools.

| Agent | Setup | Accuracy | | :--- | :--- | :--- | | Earth-Agent (Traditional) | 104 Tools | 63.16% | | OpenEarth-Agent (Ours) | 6 Tools (Tool Creation) | 59.92% | | OpenEarth-Agent (Ours) | 104 Tools (Hybrid) | 67.61% |

Superior Robustness

The paper highlights that human-engineered tools often "hard-code" sensor parameters. OpenEarth-Agent's created tools were more robust to data anomalies (like No-data values or cloud cover) because it perceives the data distribution before writing the processing code.

Performance across different LLM backbones


5. Critical Analysis: The Cost of Autonomy

While the "Tool Creation" paradigm is powerful, it has two major drawbacks:

  1. Latency: Creating, testing, and verifying tools requires multiple LLM calls. This is significantly slower than calling a pre-compiled API.
  2. Computational Footprint: The high volume of inference calls raises concerns about the carbon footprint of such autonomous systems.

Future Outlook: The authors propose a "Tool Caching" mechanism where successfully verified tools are archived for future use, blending the efficiency of tool-calling with the flexibility of tool-creation.


Summary Implementation Takeaway

For developers building agents in specialized verticals (Legal, Medical, Bio-informatics), OpenEarth-Agent proves that Dynamic Code Generation is no longer a "nice-to-have" but a requirement for handling "Long-Tail" data edge cases that static APIs simply cannot anticipate.

发现相似论文

试试这些示例

  • Search for recent papers on LLM-based autonomous agents that use "tool creation" or "code generation as tools" rather than static API calling in scientific domains.
  • Which paper first proposed the concept of "Large Language Models as Tool Makers" (LATM), and how does OpenEarth-Agent's iterative feedback loop improve upon that original architecture?
  • Explore research applying multi-agent collaborative frameworks to multi-modal geospatial analysis beyond Earth Observation, such as planetary science or maritime surveillance.
目录
[CVPR 2025 candidate] OpenEarth-Agent: Moving from Tool Calling to Tool Creation for Open-Environment Earth Observation
1. TL;DR
2. 1. The Bottleneck: The "Closed-World" Tool Trap
3. 2. Methodology: The "Tool-Maker" Architecture
3.1. A. Iterative Data Exploration
3.2. B. Adaptive Tool Creation
3.3. C. The Feedback Loop (The "Checking Agent")
4. 3. OpenEarth-Bench: A New Standard
5. 4. Key Results & Insights
5.1. Breaking the "Tool Calling" Ceiling
5.2. Superior Robustness
6. 5. Critical Analysis: The Cost of Autonomy
7. Summary Implementation Takeaway