Can fine-tuning reliably fix LLM hallucinations?

Does fine-tuning actually reduce hallucinations? Often it makes them worse.

The short answer is that standard fine-tuning frequently increases hallucinations rather than fixing them. A 2024 controlled experiment on closed-book question answering showed that when fine-tuning introduces new factual knowledge not seen during pre-training, the model learns that new information slowly, and each new fact it learns linearly increases its tendency to hallucinate [3]. In other words, the more unfamiliar data you feed it during fine-tuning, the more it fabricates answers. This effect is not subtle: a 2025 evaluation of biomedical LLMs found that fine-tuned models actually hallucinated more than their general-purpose counterparts, especially on tasks outside narrow medical knowledge [1]. The general-purpose Llama-3-8B-Instruct scored 64.3% on NEJM case challenges, while the fine-tuned OpenBioLLM-8B scored only 30% — and the fine-tuned model was more prone to making things up [1].

Why does this happen? The core issue is that fine-tuning teaches the model to generate responses that may not be grounded in its pre-existing knowledge. When the model encounters a query that touches on the new, imperfectly learned facts, it tends to produce outputs that mirror the errors in its fine-tuning data [6]. A 2025 study demonstrated that unfamiliar examples in the fine-tuning data are the primary drivers of hallucination patterns — the model's made-up answers often directly reflect the incorrect responses associated with those unfamiliar examples [6]. This means that if your fine-tuning data contains any inaccuracies or introduces concepts the base model doesn't truly understand, you are essentially training the model to hallucinate.

Are there cases where fine-tuning does reduce hallucinations? Yes, with the right approach.

Fine-tuning can reduce hallucinations, but only when it is carefully designed to target the problem directly. A 2024 study proposed a data organization method called WHW (What, How, Why) that adds detailed task descriptions and restrictions to fine-tuning data. This approach reduced hallucinations by 73% compared to standard prompt-based fine-tuning, while also improving F1 scores by 11% on role-setting tasks [2]. The key was providing explicit constraints that prevented the model from generating unsupported content.

Another promising direction is using fine-tuning to teach the model to say "I don't know" instead of fabricating answers. A 2025 study showed that by modifying how unfamiliar fine-tuning examples are supervised — for instance, training the model to refuse to answer when it lacks knowledge — you can significantly reduce hallucinations [6]. This approach was validated across multiple fine-tuning methods (supervised fine-tuning, reinforcement learning, and reward model training) on standard benchmarks like TriviaQA and MMLU [6].

Reinforcement learning with hallucination-specific rewards also shows promise. A 2026 study used an Entity Hallucination Index (EHI) as a reward signal to fine-tune summarization models, penalizing fabricated entities. Models fine-tuned this way achieved lower hallucination rates without losing informativeness, and even generalized better to out-of-domain tasks [8]. Similarly, a 2024 approach called Hallucination Aware Tuning (HAT) first trains a detection model to identify hallucinations, then uses those detections to create a preference dataset for Direct Preference Optimization (DPO) fine-tuning, resulting in LLMs with reduced hallucination rates and improved answer quality [7].

What works better than plain fine-tuning? Retrieval-augmented generation and hybrid strategies.

Given the risks of fine-tuning, many researchers now recommend retrieval-augmented generation (RAG) as a more reliable alternative. A 2025 study comparing biomedical fine-tuned models to general-purpose models concluded that RAG 'may offer a more effective strategy for clinical adaptation' [1]. RAG works by giving the model access to an external knowledge base at inference time, so it doesn't need to memorize facts during fine-tuning — reducing the incentive to hallucinate.

A 2024 study on environmental decision-making found that fine-tuned models achieved only modest gains (+1% precision) on standardized tasks but showed limited adaptability (-3%) in complex agentic workflows, while state-of-the-art generalist models outperformed them by 10% on interdisciplinary tasks [4]. The authors recommended a layered strategy: selective fine-tuning for stable, regulatory tasks, combined with RAG-based agentic workflows for dynamic, data-intensive decisions [4].

Even in specialized domains like glaucoma detection, fine-tuning alone wasn't the star. A 2025 study used GPT-4o with a vision API to generate referral letters from OCT images, achieving 91% accuracy and 100% recall — but this relied on the model's strong general capabilities plus structured clinical data, not on fine-tuning [5]. The takeaway is clear: fine-tuning can be part of the solution, but it works best when paired with external knowledge retrieval, careful data curation, and hallucination-specific training signals.

Sources used in this answer

Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks

Biomedical fine-tuned LLMs generally underperformed general-purpose models on clinical tasks and showed a higher tendency to hallucinate; e.g., OpenBioLLM-8B scored 30% vs. Llama-3-8B-Instruct's 64.3% on NEJM case challenges.

2025 · Felix J. Dorfner, Amin Dada, Felix Busch, Marcus R. Makowski, Tianyu Han, Daniel Truhn, Jens Kleesiek, Madhumita Sushil, Lisa C. Adams, Keno K. Bressem · Journal of the American Medical Informatics Association : JAMIA

Original

WHW: An Efficient Data Organization Method for Fine-tuning Large Language Models

A data organization method (WHW) adding task descriptions reduced LLM hallucinations by 73% compared to prompt fine-tuning, while improving F1 by 11% on role-setting tasks.

2024 · Lubao Wang, Huaqi Zhang, Haiming Shao, Mingxuan Wu, Wei Ren · 2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS)

Original

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Fine-tuning on new factual knowledge linearly increases hallucination tendency; models struggle to acquire new facts through fine-tuning, and each new fact learned increases hallucination risk.

2024 · Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig · Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Original

Leveraging LLMs for Environmental Complexity: Structured Fine-Tuning Data Sets and Deployment Strategies.

Fine-tuned models achieved only +1% precision gain on standardized tasks but -3% in agentic workflows; generalist models outperformed by 10% on interdisciplinary tasks.

2026 · Chuke Chen, Nan Li, Jianchuan Qi, Huimin Chang, Wenjie Shi, Jinliang Xie, Jiayi Yuan, Hang Yang, Jing Guo, Changqing Xu, Ming Xu · Environmental science & technology

Original

Glaucoma Detection for Automated Referral System: Using OCT Data and Fine-tuning LLM Models.

GPT-4o with vision API achieved 91% accuracy and 100% recall for glaucoma referral letter generation from OCT data, without relying on fine-tuning.

2025 · Mohammad Norouzifard, Azadeh Samaeili, Jason Turuwhenua · Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

Original

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Unfamiliar fine-tuning examples control how models hallucinate; modifying supervision of these examples can teach models to say 'I don't know' and reduce hallucinations.

2025 · Katie Kang, Eric Wallace, Claire J. Tomlin, Aviral Kumar, Sergey Levine · Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Original

RAG-HAT: A Hallucination-Aware Tuning Pipeline for LLM in Retrieval-Augmented Generation

Hallucination Aware Tuning (HAT) uses detection models and DPO fine-tuning to reduce hallucination rates and improve answer quality in RAG systems.

2024 · Juntong Song, Xingguang Wang, Juno Zhu, Yuanhao Wu, Xuxin Cheng, Randy Zhong, Cheng Niu · Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

Original

Fine-Tuning Large Language Models Using Entity Hallucination Index for Text Summarization.

Fine-tuning summarization models using Entity Hallucination Index (EHI) as a reward signal reduced hallucination rates without compromising informativeness and improved out-of-domain generalization.

2026 · Praveenkumar K, Rakesh Chandra Balabantaray, Kali Prasad Vittala, Muktikanta Sahu · Journal of visualized experiments : JoVE

Original