Is prompt engineering a legitimate scientific discipline?

What exactly is prompt engineering, and why does it matter?

Prompt engineering is the practice of designing and refining the instructions (prompts) you give to a large language model (LLM) like ChatGPT to get useful, accurate, and reliable outputs. It matters because LLMs are powerful but unpredictable—without a well-crafted prompt, they can produce irrelevant, biased, or even fabricated information. For example, a prompt engineering strategy called 'ChemPrompt' was used to guide ChatGPT in extracting synthesis conditions from chemistry papers, achieving precision, recall, and F1 scores of 90-99% [1]. That means the system correctly identified and recorded nearly all the relevant data points with very few errors, turning a hallucination-prone chatbot into a reliable research assistant. In healthcare, prompt engineering is being called an 'important emerging skill' for medical professionals, with tutorials now available to help doctors and nurses craft prompts that yield clinically useful answers [2][3].

What evidence shows prompt engineering is more than just guesswork?

Several studies demonstrate that prompt engineering follows systematic, reproducible methods that produce measurable improvements. A 2024 study introduced a method called PE2, which uses a detailed meta-prompt with step-by-step reasoning templates; it outperformed the standard 'let's think step by step' prompt by 6.3% on a math reasoning benchmark (MultiArith) and by 3.1% on another (GSM8K) [4]. These are not trivial gains—they show that a carefully engineered prompt can consistently boost an LLM's performance on complex tasks. Similarly, researchers have developed a catalog of reusable 'prompt patterns'—analogous to software design patterns—that solve common problems like enforcing output formats or automating multi-step processes [10]. This pattern-based approach has been applied successfully in software testing [11] and STEM education, where a prompt-engineered tool acts as a virtual mentor, generating quizzes and explanations tailored to a student's grade level [9]. These examples show that prompt engineering is not just anecdotal; it has transferable, documented techniques that yield reliable results.

What's missing for prompt engineering to be a true scientific discipline?

Despite the promising evidence, prompt engineering lacks the standardized evaluation and theoretical foundations that define a mature science. A 2024 scoping review of 114 medical prompt engineering studies found that 61% of prompt design papers did not report any non-prompt baseline for comparison, meaning they couldn't prove their prompts were better than a simple alternative [5]. Many studies also failed to document key details like the exact prompt wording or the model version used, making it hard to replicate results. Another paper proposed a systematic assessment framework (SAFE-PE) precisely because current practices are 'based on trial-and-error or task-specific benchmarks' [6]. The field also struggles with reproducibility: a hermeneutics study found that increasing prompt specificity led to 'intensified neutrality' in ChatGPT's output, suggesting that optimizing for factual accuracy can actually reduce the meaningfulness of the response [8]. These gaps mean that while prompt engineering has scientific elements, it is still more of a craft—effective but not yet governed by universal, peer-reviewed standards.

Sources used in this answer

ChatGPT Chemistry Assistant for Text Mining and the Prediction of MOF Synthesis

A ChemPrompt engineering workflow achieved 90-99% precision, recall, and F1 scores in extracting 26,257 synthesis parameters from ~800 MOF papers, and the resulting data trained a machine-learning model with >87% accuracy in predicting crystallization outcomes.

2023 · Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T Chayes, Omar M Yaghi · Journal of the American Chemical Society

Original

Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial

Prompt engineering is described as a relatively new field of research and an important emerging skill for medical professionals, with practical recommendations for improving interactions with LLMs.

2023 · Bertalan Meskó · Journal of medical Internet research

Original

Prompt Engineering in Healthcare

The article highlights a knowledge gap in medical education regarding prompt engineering and advocates for it as a core competency to improve patient outcomes and healthcare delivery.

2024 · Rajvardhan Patil, Thomas F. Heston, Vijay Bhuse · Electronics

Original

Prompt Engineering a Prompt Engineer

The PE2 method, using detailed meta-prompts with step-by-step reasoning, outperformed 'let's think step by step' by 6.3% on MultiArith and 3.1% on GSM8K, and beat competitive baselines on counterfactual tasks by 6.9%.

2024 · Qinyuan Ye, Mohamed Ahmed, Reid Pryzant, Fereshte Khani · Findings of the Association for Computational Linguistics: ACL 2024

Original

Prompt Engineering Paradigms for Medical Applications: Scoping Review.

A scoping review of 114 medical prompt engineering studies found that 61% of prompt design papers did not report any non-prompt baseline, and many neglected to document key prompt engineering-specific information.

2024 · Jamil Zaghir, Marco Naguib, Mina Bjelogrlic, Aurélie Névéol, Xavier Tannier, Christian Lovis · Journal of medical Internet research

Original

SAFE-PE, A Systematic Assessment Framework for Evaluating Prompt Engineering in Generative AI

The SAFE-PE framework proposes standard measures (accuracy, diversity, robustness, interpretability, fairness, ethics) to evaluate prompt quality, reliability, and reproducibility, addressing the current lack of a clear assessment framework.

2026 · Kashif Laeeq, Sherbano Saleem · Sukkur IBA Journal of Computing and Mathematical Sciences

Original

Towards a Catalog of Prompt Patterns to Enhance the Discipline of Prompt Engineering

The paper argues that understanding of effective prompts is largely anecdotal and fragmented, and calls for a systematic, disciplined approach to prompt engineering to improve reliability in mission-critical software.

2024 · Douglas C. Schmidt, Jesse Spencer-Smith, Quchen Fu, Jules White · ACM SIGAda Ada Letters

Original

Prompting meaning: a hermeneutic approach to optimising prompt engineering with ChatGPT

Increasing the specificity of prompts led to intensified neutrality in ChatGPT's output, suggesting that optimizing for factual accuracy may reduce the hermeneutic value (meaningfulness) of the text.

2023 · Leah Henrickson, Albert Meroño-Peñuela · AI & Society

Original

Using Prompt Engineering to Enhance STEM Education

A prototype tool using prompt engineering was developed to generate educational content (descriptions, Q&A, quizzes) tailored to K-12 students' grade levels, acting as a virtual mentor to enhance STEM education.

2024 · Max Z Li · 2024 IEEE Integrated STEM Education Conference (ISEC)

Original

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

A catalog of 15+ prompt patterns (e.g., persona, chain-of-thought, output formatting) is presented as reusable solutions for common problems when conversing with LLMs, analogous to software design patterns.

2023 · Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, Douglas C. Schmidt · arXiv.org

Original

Prompt Engineering Impacts to Software Test Architectures for Beginner to Experts

The paper introduces prompt engineering concepts for software test engineers, providing example prompts and discussing implications for improving AI-assisted testing, though it notes this is just a beginning.

2024 · Jon D. Hagar, Satoshi Masuda · ICSTW

Original