Can LLMs contribute meaningfully to novel scientific discovery?

Are LLM ideas actually more novel than human ideas?

Yes, according to the most rigorous head-to-head comparison to date. Researchers recruited over 100 NLP experts to write novel research ideas, then had them blindly review both human and LLM-generated ideas. The LLM ideas were judged as significantly more novel (p<0.05, meaning the result is unlikely to be due to chance) [1]. However, the same reviewers rated the LLM ideas as slightly weaker on feasibility — meaning they were more creative but harder to actually execute. This tradeoff is crucial: novelty alone doesn't guarantee a usable discovery.

Can LLMs actually find new things in real data?

Yes, and a concrete example comes from astronomy. Researchers used an LLM to interpret unusual celestial sources that machine learning algorithms had flagged as anomalies in infrared light curves and spectral energy distributions from the NEOWISE survey. After validating the approach on known rare variable sources, they applied it to previously unclassified objects and successfully identified dozens with high scientific potential — and the LLM even generated AI-proposed follow-up observation plans [3]. This shows LLMs can bridge the 'final mile' between data-driven anomaly detection and physical interpretation, a step that often stumps individual experts due to the breadth of modern astrophysics.

What's the catch? Where do LLMs still fall short?

The main catch is reliability. The same study that found LLMs more novel also identified 'failures of LLM self-evaluation' and a 'lack of diversity in generation' — meaning the models often can't tell which of their own ideas are good, and they tend to produce similar-sounding ideas [1]. In biology, attempts to use LLMs for filtering predictions and generating hypotheses have been 'impeded by issues such as hallucinations and the lack of structured knowledge grounding' [4]. To fix this, researchers built a collaborative system called HypoChainer that combines LLMs with knowledge graphs and human expertise, showing that grounding LLMs in structured data (like knowledge graphs) can make their outputs more reliable for hypothesis-driven discovery [4]. So LLMs work best as part of a team — not as solo inventors.

Sources used in this answer

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

LLM-generated research ideas were judged as significantly more novel than human expert ideas (p<0.05) but slightly weaker on feasibility, based on blind reviews by over 100 NLP researchers.

2024 · Chenglei Si, Diyi Yang, Tatsunori Hashimoto · arXiv.org

Original

Large language models and their role in modern scientific discoveries

LLMs accelerate scientific research by efficiently processing big data, but raise fundamental questions about whether the results constitute new knowledge and what scientific creativity means in the era of big computing.

2024 · V. Yu. Filimonov · Philosophical Problems of IT & Cyberspace (PhilIT&C)

Original

Closing the Final Mile in Data-Driven Discovery: Interpreting Uncharted Celestial Sources with Large Language Models Across Multimodal Data

An LLM framework successfully interpreted anomalous celestial sources from NEOWISE data, identifying dozens of previously unclassified objects with high scientific potential and generating AI-proposed follow-up observation plans.

2025 · Yanxia Zhang, Zihan Kang, Jingyi Zhang, Jinghang Shi, Changhua Li

Original

HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery.

A collaborative system (HypoChainer) that combines LLMs with knowledge graphs and human expertise improved hypothesis-driven discovery in biology, overcoming LLM hallucinations and lack of structured grounding.

2026 · Haoran Jiang, Shaohan Shi, Yunjie Yao, Chang Jiang, Quan Li · IEEE transactions on visualization and computer graphics

Original

Editorial: Harnessing the Power of Large Language Model-Based Chatbots for Scientific Discovery

Editorial perspective highlighting the potential of LLM-based chatbots for scientific discovery, particularly in chemistry and drug design, while noting the need for careful integration with existing methods.

2023 · Kenneth M. Merz Jr., Guo-Wei Wei, Feng Zhu · Journal of chemical information and modeling

Original