This paper introduces Cerebra, an interactive multi-agent AI system designed for multimodal dementia characterization and risk assessment. By coordinating specialized agents for EHR, clinical notes, and medical imaging, Cerebra synthesizes heterogeneous clinical data into an interpretable dashboard, achieving SOTA AUROC scores (up to 0.86 for diagnosis) across multi-institutional datasets.
TL;DR
Dementia care is a puzzle comprised of fragmented EHR data, narrative physician notes, and complex imaging. Cerebra is a new agentic AI system that mimics a multidisciplinary medical board. By delegating tasks to specialized agents and synthesizing their findings via a "Super Agent," it reaches a SOTA AUROC of 0.86 for diagnosis and improves clinician accuracy by 17.5%.
From Black-Box Predictions to a Virtual Medical Board
In neurology, diagnosing Alzheimer’s Disease and Related Dementias (AD/ADRD) is notoriously difficult. Clinicians must weigh vascular comorbidities in the EHR, functional complaints in clinical notes, and structural atrophy in brain MRIs.
Existing AI models typically treat this as a "fusion" problem—flattening all data into a single vector and spitting out a risk score. This "Black Box" approach fails because:
- Missing Data: If a patient doesn't have an MRI, the whole model often breaks.
- Lack of Justification: A single score doesn't tell a doctor why the patient is at risk.
- Static Logic: The models don't learn from a specific doctor's feedback over time.
The Core Mechanism: Agentic Orchestration
Cerebra solves this by architecting the AI as a Multi-Agent Team. Instead of one giant model, the system uses:
- The Super Agent: The "Chairman" who coordinates the workflow.
- Modality Agents: Specialized "Experts" for EHR, Imaging (MRI/OCT), and Clinical Notes.
- The Data Agent: A Text-to-SQL specialist that retrieves patient records without human coding.
Figure 1: The Cerebra architecture showing the Super Agent orchestrating data flow between specialized modality experts.
The "Propose-and-Critique" Fusion
Instead of simple averaging, Cerebra uses a debate mechanism. The agent with the highest risk score (e.g., the MRI agent seeing hippocampal atrophy) "proposes" an assessment. The other agents then review this evidence against their own data (e.g., the EHR agent checking for conflicting metabolic factors), leading to a consensus that is grounded in multi-source evidence.
Proven Performance Across Four Health Systems
The researchers didn't just test this in a lab. They deployed it across four major systems (NYU, UF, INPC, LI) involving 3 million patients.
Key Results:
- Better than LLMs: While GPT-4o and MedGemma struggled with real-world clinical imbalance, Cerebra achieved a 0.80 AUROC for 3-year risk prediction.
- Graceful Degradation: If an MRI is missing, Cerebra doesn't fail; it dynamically re-weights findings from available notes and EHR data.
- Physician-in-the-loop: The Dynamic Medical Notebook allows doctors to correct the AI's reasoning, effectively "training" the system's memory for future cases.
Figure 2: Clinician reader study results showing significant gains in sensitivity and diagnostic confidence when assisted by Cerebra.
A Clinician-Centered Dashboard
The true value of Cerebra lies in its dashboard. Rather than a number, it provides a conversational interface. A physician can ask, "Why is this patient high risk?" Cerebra points back to the MRI highlights or specific tokens in nursing notes. This transparency increased clinician accuracy from 65.8% to 83.3% in the reader study.
Figure 3: The clinician-facing dashboard integrating risk trajectories, evidence snippets, and recommendations.
Conclusion and Future Outlook
Cerebra represents a shift from "AI-as-a-Tool" to "AI-as-a-Collaborator." By leveraging an agentic structure, it handles the messy, incomplete reality of medical data while providing the "Explainability" that healthcare demands.
Limitations: The model still relies on general-purpose LLMs for the "glue" reasoning. Future versions will likely integrate domain-specific foundation models (like Med-PALM) to further reduce hallucination risks and provide more granular treatment guidance.
