ProcureGym is a data-driven multi-agent Markov Game framework designed to simulate China’s National Volume-Based drug Procurement (NVBP). Leveraging real-world data from 7 rounds of procurement covering 325 drugs and 2,267 firms, it evaluates Reinforcement Learning (RL), Large Language Model (LLM), and Rule-based agents, with RL agents (MAPPO/IPPO) achieving a SOTA winner prediction accuracy of 74.81%.
TL;DR
Researchers from Fudan and Tongji University have unveiled ProcureGym, the first comprehensive data-driven simulation platform for China’s National Volume-Based drug Procurement (NVBP). By treating the multi-billion dollar bidding process as a Markov Game, the study proves that AI agents—specifically Reinforcement Learning (RL) models—can predict winning outcomes with nearly 75% accuracy. This work bridges the gap between theoretical game theory and real-world policy impact, offering a high-fidelity "sandbox" for economic strategy.
The "Billion-Dollar" Bidding Problem
Since 2018, China’s NVBP has saved over 500 billion CNY. However, for pharmaceutical firms, it is a high-stakes game of "prisoner's dilemma" under incomplete information. Firms must cut prices to win a guaranteed government-procured volume (the "volume"), but cutting too deep erodes profit margins.
Existing models typically fall into two traps:
- Analytical Models: Too simplified to handle thousands of diverse firms.
- Traditional ABMs: Rely on static rules that can't "learn" or adapt to new policies.
ProcureGym addresses this by building a simulator grounded in real-world data from 325 drugs and 2,267 firms, allowing AI agents to navigate the non-linear dynamics of competitive bidding.
Methodology: High-Fidelity Markov Games
The core of ProcureGym is the formalization of bidding as a Markov Game. The state space includes 10 critical dimensions, such as the maximum valid bidding price ($P_{max}$), procurement ratio ($\rho$), and internal production costs ($C_i$).
The Agent Spectrum
The platform supports a unified interface for three distinct "agent brains":
- RL-based (MAPPO/IPPO): These agents treat profit as a reward signal, iteratively refining their bids to find the sweet spot between winning and margin.
- LLM-based (Qwen/GPT): Utilizing a Perception-Memory-Decision-Reflection loop, these agents use natural language to "think" through the competition.
- Rule-based: Heuristic strategies representing "old-school" business logic.
Figure 1: Overview of the ProcureGym Framework, showcasing the data integration and multi-agent interaction layers.
Experiments: RL vs. LLM vs. Reality
The results provide a fascinating look at machine "rationality."
- Prediction Accuracy: MAPPO achieved the highest explanatory power ($R^2 = 0.79$) and specialized in Winner Alignment (75% accuracy), significantly outperforming human-crafted rules (64%).
- Profit Optimization: RL agents didn't just copy history; they improved upon it, learning strategies that secured higher profits than historical benchmarks.
- LLM "Homo Silicus": While slightly less accurate in pricing than RL, LLMs provided interpretability. High-cost firms were observed to protect margins, while low-cost firms aggressively chased market share—logic confirmed by LLM reasoning logs.
Figure 2: Evaluation of NVBP Simulation. Note the strong spearman correlation ($0.85-0.88$) between predicted and actual prices.
Counterfactual Analysis: Stress-Testing the Policy
One of ProcureGym's most powerful features is its ability to perform sensitivity analysis.
- Price link strategy: Increasing the maximum bidding price ($P_{max}$) naturally leads to higher bids but also higher profits across the board.
- Demand Dominance: The simulations identified market demand ($Q_e$) as the single most influential driver of firm profitability, even more so than cost reductions or procurement ratios.
Figure 3: Sensitivity Analysis of Policy and Market parameters. RL methods (blue/orange) show the most stability across varying procurement volumes.
Conclusion and Future Horizons
ProcureGym represents a significant step forward in Computational Economics. It demonstrates that AI can do more than generate text or images; it can model the pulse of a national economy.
Takeaways for the Industry:
- For Firms: RL-based strategy optimization can provide a competitive edge in "volume-for-price" environments.
- For Regulators: The platform allows for the "pre-testing" of bidding rules, revealing potential unintended consequences before they affect millions of patients.
Limitations: Currently, the model focuses on firm behavior. Future iterations incorporating the decision-making logic of hospitals and government oversight would create a truly holistic "Digital Twin" of the healthcare ecosystem.
