Can natural language processing systems truly understand sarcasm and irony?

What does it mean for a machine to 'detect' sarcasm?

When researchers say a system 'detects' sarcasm, they mean it can classify a piece of text (or a text-image pair) as sarcastic or non-sarcastic with a certain accuracy — not that it grasps the humor or intent. For instance, a 2025 model called SemIRNet achieved 88.87% accuracy and an F1 score of 86.33% on a multimodal sarcasm benchmark by fusing text, images, and common-sense knowledge from ConceptNet [1]. That is impressive, but it is still a statistical pattern-matching task: the model learns that certain word-image combinations (e.g., a positive caption with a negative image) are likely sarcastic. It does not 'get' the joke.

Similarly, a 2024 Urdu sarcasm detector using a cascaded multi-head attention architecture outperformed simpler models on a curated tweet dataset [3], and a 2023 Bangla BERT-based system hit 99.60% accuracy on a Facebook/YouTube comment dataset [7]. These numbers show that with enough training data and the right architecture, machines can become very good at spotting sarcastic patterns — but they remain brittle. A 2025 review of 15 studies noted that even simple logistic regression with TF-IDF features achieved a 72.3% accuracy and an ROC-AUC of 0.75 on a Reddit dataset, confirming that surface-level cues (like word choice and punctuation) carry a lot of signal [6]. The catch: these systems often fail when sarcasm relies on shared cultural knowledge, tone of voice, or subtle shifts in conversational context.

Why context and multiple data types boost performance — and where they still fall short

Sarcasm is notoriously context-dependent, and models that incorporate dialogue history, speaker traits, or multiple data modalities (text, audio, video) consistently outperform those that analyze single sentences in isolation. A 2025 multimodal system that combined BERT text embeddings with audio pitch/tone analysis and facial emotion recognition from video improved sarcasm detection accuracy over text-only baselines [9]. Likewise, a 2022 model that integrated affective (emotional) information with dependency graphs using a Relational Graph Attention Network outperformed prior state-of-the-art on six benchmark datasets, with gains of up to 4.19% in accuracy and 4.33% in F1 [4]. These results show that adding context — whether from previous utterances, emotional tone, or visual cues — helps machines pick up on the incongruity that signals sarcasm.

Yet context introduces its own challenges. The 2025 review highlighted that inconsistent context windowing (how much prior dialogue the model considers) and cultural variation are major unresolved issues [6]. A model trained on English Reddit sarcasm may fail on Libyan Arabic Facebook posts, where a 2024 SVM model achieved only 79.15% accuracy despite using handcrafted lexical and syntactic features [2]. Even within the same language, dialect matters: a Kurdish Sorani dataset (KuSarcasm) required a hybrid approach combining multilingual BERT, semantic similarity scoring, and over 100 rule-based linguistic patterns to label 16,833 entries [10]. And a 2025 ternary Bangla dataset (BanglaSarc3) with 12,089 comments balanced across sarcastic, neutral, and non-sarcastic classes was created precisely because existing resources were inadequate [11]. The takeaway: context helps, but it also multiplies the complexity, and no single model works across all languages or settings.

The missing piece: understanding stance and intention

A deeper limitation is that most sarcasm detectors ignore the author's underlying stance — whether they are for, against, or neutral toward the topic. A 2022 study proposed 'stance-level sarcasm detection' (SLSD) and showed that incorporating stance information via a BERT + graph attention network significantly improved performance on both a Chinese dataset and the SemEval-2018 English benchmark [12]. The authors argued that sarcasm often arises from a mismatch between stated sentiment and actual stance (e.g., saying 'Great job!' when you oppose the outcome), and that capturing this hidden stance is essential. Their model outperformed prior baselines 'by a large margin,' suggesting that current systems miss a key human reasoning step.

This stance gap points to a fundamental truth: machines can learn to associate certain patterns with sarcasm, but they do not reason about intent. A 2025 paper on sarcastic comment generation used a fine-tuned GPT-2 to produce sarcastic text, achieving human-level plausibility in some cases [8], but generation is not understanding. The model can mimic sarcasm without knowing what it means. As the 2024 sentiment analysis review noted, advanced NLP models like CNNs and RNNs can capture contextual nuances such as sarcasm and slang, but they remain 'black boxes' — we see the output, not the reasoning [5]. Explainability tools like LIME (Local Interpretable Model-Agnostic Explanations) have been applied to Bangla sarcasm detection to show which words drive the decision [7], but that reveals correlation, not comprehension.

Sources used in this answer

SemIRNet: A Semantic Irony Recognition Network for Multimodal Sarcasm Detection

SemIRNet, a multimodal model fusing text, images, and common-sense knowledge, achieved 88.87% accuracy and 86.33% F1 on a sarcasm benchmark, improving on prior best by 1.64% and 2.88%.

2025 · Jingxuan Zhou, Yuehao Wu, Yibo Zhang, Yeyubei Zhang, Yunchong Liu, Bolin Huang, Chunhong Yuan · 2025 10th International Conference on Information and Network Technologies (ICINT)

Original

Sarcasm Detection in Libyan Arabic Dialects Using Natural Language Processing Techniques

An SVM model for Libyan Arabic sarcasm detection on 5,082 Facebook comments reached 79.15% accuracy, 79.3% precision, 79.7% recall, and 79.5% F1.

2024 · Azalden Alakrot, Aboagela Dogman, Fathi Ammer · 2024 IEEE 4th International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA)

Original

A novel transformer attention‐based approach for sarcasm detection

UrduSarcasmNet, using cascaded group multi-head attention, outperformed simple attention and state-of-the-art models on a curated Urdu tweet dataset.

2024 · Shumaila Khan, Iqbal Qasim, Wahab Khan, Khursheed Aurangzeb, Javed Ali Khan, Muhammad Shahid Anwar · Expert Syst. J. Knowl. Eng.

Original

Affection Enhanced Relational Graph Attention Network for Sarcasm Detection

An Affection Enhanced Relational Graph Attention Network (ARGAT) integrating affective and dependency information improved accuracy by up to 4.19% and F1 by 4.33% over prior methods on six benchmarks.

2022 · Guowei Li, Fuqiang Lin, Wangqun Chen, Bo Liu · Applied Sciences

Original

Natural Language Processing (NLP) for Sentiment Analysis in Social Media

A desktop review of NLP for social media sentiment analysis found that deep learning models (CNNs, RNNs) capture sarcasm and slang but remain black boxes, with ethical and cross-cultural gaps.

2024 · Thomas Joseph · International Journal of Computing and Engineering

Original

Sarcasm Detection in Conversational Contexts: A Comprehensive Review with a Logistic Regression Baseline Study

A review of 15 sarcasm detection studies plus a logistic regression baseline on Reddit data achieved 72.3% accuracy and 0.75 ROC-AUC, highlighting context windowing and cultural bias as key challenges.

2025 · Jaipuneeth Jaishree Prabhu, Sai Preetham Rajappa Velur · Premier journal of science.

Original

Interpretable Bangla Sarcasm Detection using BERT and Explainable AI

A BERT-based Bangla sarcasm detector reached 99.60% accuracy on a new Facebook/YouTube dataset (BanglaSarc), far exceeding traditional ML (89.93%), with LIME explainability.

2023 · Ramisa Anan, Tasnim Sakib Apon, Zeba Tahsin Hossain, Elizabeth Antora Modhu, Sudipta Mondal, Md. Golam Rabiul Alam · CCWC

Original

Sarcastic Comment Generation Using Natural Language Processing

A fine-tuned GPT-2 model trained on 3,000 sarcastic prompts generated contextually appropriate sarcastic comments, evaluated via perplexity and human assessment.

2025 · Laxmi Kumari, Santosh Kumar Bharti · 2025 International Conference on Artificial Intelligence and Machine Vision (AIMV)

Original

Context-Aware Sarcasm Detection in Text Using NLP

A multimodal sarcasm detection system combining BERT/Word2Vec for text, audio pitch/tone, and video facial expressions improved accuracy over text-only models.

2025 · Mrunal Deshmukh, Nilesh Joshi, Manisha Bharati · 2025 IEEE Pune Section International Conference (PuneCon)

Original

KuSarcasm: Automated annotation of a sarcasm dataset using hybrid NLP techniques.

KuSarcasm, a Kurdish Sorani dataset of 16,833 entries, was built using mBERT, SBERT, and 100+ rule-based patterns for automatic sarcasm annotation.

2025 · Shakhawan Aghajan, Rebwar M Nabi · Data in brief

Original

BanglaSarc3: A benchmark dataset for Bangla sarcasm detection from social media to advance Bangla NLP.

BanglaSarc3, a ternary dataset of 12,089 Facebook comments (sarcastic, neutral, non-sarcastic), provides a balanced benchmark for Bangla sarcasm detection.

2025 · Susmoy Biswas, Md Mostafizur Rahman Zahid, Mst Taposi Rabeya, Md Minhazul Abedin, Md Hasan Imam Bijoy, Md Sadekur Rahman · Data in brief

Original

Stance-level Sarcasm Detection with BERT and Stance-centered Graph Attention Networks

A stance-level sarcasm detection framework (BERT + stance-centered graph attention) significantly outperformed baselines on Chinese and English datasets by modeling author stance.

2022 · Yazhou Zhang, Dan Ma, Prayag Tiwari, Chen Zhang, Mehedi Masud, Mohammad Shorfuzzaman, Dawei Song · ACM Trans. Internet Techn.

Original