This paper introduces Mozi, a dual-layer LLM agent architecture for autonomous drug discovery that integrates governed multi-agent orchestration with structured, stateful "Skill Graphs." By bridging the Model Context Protocol (MCP) with domain-specific pipelines, Mozi achieves SOTA performance on the PharmaBench benchmark and successfully executes end-to-end therapeutic design tasks.
TL;DR
Mozi is a new agentic framework that transforms LLMs from "fragile conversationalists" into reliable "co-scientists." It introduces a Dual-Layer Architecture that separates high-level strategic reasoning (Control Plane) from the rigorous, state-dependent execution of drug discovery pipelines (Workflow Plane). By enforcing hard-coded tool governance and stateful skill graphs, Mozi eliminates the "hallucination drift" common in long-horizon scientific tasks.
The Problem: Why LLM Agents Fail in the Lab
Current AI agents suffer from two critical bottlenecks in pharmaceutical research:
- Unconstrained Tool Governance: General agents often invoke expensive computational tools with invalid parameters or without proper clearance.
- Long-Horizon Reliability: In a pipeline spanning from Target Identification to Lead Optimization, a 5% error in step one compounds multiplicatively, rendering the final candidate scientifically invalid.
Existing SOTA models like Biomni often function as "islands of intelligence"—they are great at single tasks but lack the interoperability and auditability required for regulated drug R&D.
Methodology: Governed Autonomy via Dual-Layer Design
The core innovation of Mozi lies in its separation of Logic and Integrity.
Layer A: The Control Plane (The Brain)
Instead of a simple ReAct loop, Layer A implements a Supervisor-Worker hierarchy. The Supervisor manages a "minimal planning" strategy, while specialized Workers (Research vs. Computation) are isolated via Role-Based Access Control (RBAC). This prevents a "Research Worker" from accidentally triggering a 10-hour GPU-intensive docking simulation.
Layer B: The Workflow Plane (The Skeleton)
Scientific protocols are materialized as Composable Skill Graphs. These are not just sequences of tools; they are stateful Directed Acyclic Graphs (DAGs) that enforce:
- Data Contracts: Ensuring a protein structure is "cleaned" (via PDBFixer) before it ever touches a docking engine.
- HITL Checkpoints: Strategic pauses where human experts must validate a target or a scaffold before the agent proceeds.
Figure 1: The Workflow Plane (Layer B) captures the canonical small-molecule discovery pipeline.
Experiments: SOTA on PharmaBench
The researchers introduced PharmaBench, a benchmark of 88 complex tasks. Mozi outperformed previous baselines by significant margins:
- Quantitative Success: On regression tasks (ADMET, DTI), Mozi demonstrated superior tool selection and parameter precision.
- Qualitative Mastery: In a 28-task expert-level "Human-Last Exam" subset, Mozi powered by DeepSeek-V3.2 surpassed even proprietary models like Gemini-2.5-Pro.
Table 1: Mozi vs. Biomni—Accuracy gains across MCQ, Classification, and Regression tasks.
Case Study: Parkinson’s Disease & LRRK2
In a real-world stress test, Mozi was tasked with finding inhibitors for the LRRK2 kinase.
- Target ID: It autonomously selected the 8TXZ cryo-EM structure.
- Screening: It screened 377,760 compounds using LigUnity.
- Corrective Evolution: When early leads showed hERG toxicity (potential heart risk), the Lead Optimization module autonomously navigated the chemical space to find a safer scaffold.
- Result: The final candidate achieved a docking score of -8.924 kcal/mol, comparable to the Phase-II clinical drug DNL-201.
Critical Analysis & Conclusion
Mozi represents a shift from "AI agents as assistants" to "AI agents as infrastructure." By using the Model Context Protocol (MCP), it federates a universe of tools (UniProt, PDB, AutoDock) into a unified fabric.
Limitations: The system still relies on in silico surrogate models. While the AI predicts a -8.9 kcal/mol binding, the "physiological reality" still requires wet-lab validation. Future work must integrate Uncertainty Quantification (UQ) to tell the human expert exactly how "confident" the AI is in its toxicity filters.
Takeaway: Mozi proves that for AI to conquer science, it doesn't just need more parameters; it needs better governance.
