Does fine-tuning actually reduce hallucinations? Often it makes them worse.
The short answer is that standard fine-tuning frequently increases hallucinations rather than fixing them. A 2024 controlled experiment on closed-book question answering showed that when fine-tuning introduces new factual knowledge not seen during pre-training, the model learns that new information slowly, and each new fact it learns linearly increases its tendency to hallucinate [3]. In other words, the more unfamiliar data you feed it during fine-tuning, the more it fabricates answers. This effect is not subtle: a 2025 evaluation of biomedical LLMs found that fine-tuned models actually hallucinated more than their general-purpose counterparts, especially on tasks outside narrow medical knowledge [1]. The general-purpose Llama-3-8B-Instruct scored 64.3% on NEJM case challenges, while the fine-tuned OpenBioLLM-8B scored only 30% — and the fine-tuned model was more prone to making things up [1].
Why does this happen? The core issue is that fine-tuning teaches the model to generate responses that may not be grounded in its pre-existing knowledge. When the model encounters a query that touches on the new, imperfectly learned facts, it tends to produce outputs that mirror the errors in its fine-tuning data [6]. A 2025 study demonstrated that unfamiliar examples in the fine-tuning data are the primary drivers of hallucination patterns — the model's made-up answers often directly reflect the incorrect responses associated with those unfamiliar examples [6]. This means that if your fine-tuning data contains any inaccuracies or introduces concepts the base model doesn't truly understand, you are essentially training the model to hallucinate.
Are there cases where fine-tuning does reduce hallucinations? Yes, with the right approach.
Fine-tuning can reduce hallucinations, but only when it is carefully designed to target the problem directly. A 2024 study proposed a data organization method called WHW (What, How, Why) that adds detailed task descriptions and restrictions to fine-tuning data. This approach reduced hallucinations by 73% compared to standard prompt-based fine-tuning, while also improving F1 scores by 11% on role-setting tasks [2]. The key was providing explicit constraints that prevented the model from generating unsupported content.
Another promising direction is using fine-tuning to teach the model to say "I don't know" instead of fabricating answers. A 2025 study showed that by modifying how unfamiliar fine-tuning examples are supervised — for instance, training the model to refuse to answer when it lacks knowledge — you can significantly reduce hallucinations [6]. This approach was validated across multiple fine-tuning methods (supervised fine-tuning, reinforcement learning, and reward model training) on standard benchmarks like TriviaQA and MMLU [6].
Reinforcement learning with hallucination-specific rewards also shows promise. A 2026 study used an Entity Hallucination Index (EHI) as a reward signal to fine-tune summarization models, penalizing fabricated entities. Models fine-tuned this way achieved lower hallucination rates without losing informativeness, and even generalized better to out-of-domain tasks [8]. Similarly, a 2024 approach called Hallucination Aware Tuning (HAT) first trains a detection model to identify hallucinations, then uses those detections to create a preference dataset for Direct Preference Optimization (DPO) fine-tuning, resulting in LLMs with reduced hallucination rates and improved answer quality [7].
What works better than plain fine-tuning? Retrieval-augmented generation and hybrid strategies.
Given the risks of fine-tuning, many researchers now recommend retrieval-augmented generation (RAG) as a more reliable alternative. A 2025 study comparing biomedical fine-tuned models to general-purpose models concluded that RAG 'may offer a more effective strategy for clinical adaptation' [1]. RAG works by giving the model access to an external knowledge base at inference time, so it doesn't need to memorize facts during fine-tuning — reducing the incentive to hallucinate.
A 2024 study on environmental decision-making found that fine-tuned models achieved only modest gains (+1% precision) on standardized tasks but showed limited adaptability (-3%) in complex agentic workflows, while state-of-the-art generalist models outperformed them by 10% on interdisciplinary tasks [4]. The authors recommended a layered strategy: selective fine-tuning for stable, regulatory tasks, combined with RAG-based agentic workflows for dynamic, data-intensive decisions [4].
Even in specialized domains like glaucoma detection, fine-tuning alone wasn't the star. A 2025 study used GPT-4o with a vision API to generate referral letters from OCT images, achieving 91% accuracy and 100% recall — but this relied on the model's strong general capabilities plus structured clinical data, not on fine-tuning [5]. The takeaway is clear: fine-tuning can be part of the solution, but it works best when paired with external knowledge retrieval, careful data curation, and hallucination-specific training signals.
Sources used in this answer
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks
Biomedical fine-tuned LLMs generally underperformed general-purpose models on clinical tasks and showed a higher tendency to hallucinate; e.g., OpenBioLLM-8B scored 30% vs. Llama-3-8B-Instruct's 64.3% on NEJM case challenges.
WHW: An Efficient Data Organization Method for Fine-tuning Large Language Models
A data organization method (WHW) adding task descriptions reduced LLM hallucinations by 73% compared to prompt fine-tuning, while improving F1 by 11% on role-setting tasks.
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Fine-tuning on new factual knowledge linearly increases hallucination tendency; models struggle to acquire new facts through fine-tuning, and each new fact learned increases hallucination risk.
Leveraging LLMs for Environmental Complexity: Structured Fine-Tuning Data Sets and Deployment Strategies.
Fine-tuned models achieved only +1% precision gain on standardized tasks but -3% in agentic workflows; generalist models outperformed by 10% on interdisciplinary tasks.
Glaucoma Detection for Automated Referral System: Using OCT Data and Fine-tuning LLM Models.
GPT-4o with vision API achieved 91% accuracy and 100% recall for glaucoma referral letter generation from OCT data, without relying on fine-tuning.
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Unfamiliar fine-tuning examples control how models hallucinate; modifying supervision of these examples can teach models to say 'I don't know' and reduce hallucinations.
RAG-HAT: A Hallucination-Aware Tuning Pipeline for LLM in Retrieval-Augmented Generation
Hallucination Aware Tuning (HAT) uses detection models and DPO fine-tuning to reduce hallucination rates and improve answer quality in RAG systems.
Fine-Tuning Large Language Models Using Entity Hallucination Index for Text Summarization.
Fine-tuning summarization models using Entity Hallucination Index (EHI) as a reward signal reduced hallucination rates without compromising informativeness and improved out-of-domain generalization.
