Unpacking Sarcasm: A Contextual and Transformer-Based Approach for Improved Detection
<p>Figure of Abstract.</p> "> Figure 2
<p>Preprocessing examples from different datasets.</p> "> Figure 3
<p>Architecture of the transform model.</p> "> Figure 4
<p>Conversation summarization with BART-Large.</p> "> Figure 5
<p>Accuracy percentages (%) across different conditions for sarcasm detection.</p> "> Figure 6
<p>Feature importance analysis for sarcasm detection.</p> "> Figure 7
<p>Computational efficiency analysis: RoBERTa vs. DistilBERT.</p> "> Figure 8
<p>A 3D comparison of Jaccard coefficients across models and conditions.</p> "> Figure 9
<p>Confusion matrices for RoBERTa, DistilBERT, and random forest models.</p> ">
Abstract
:1. Introduction
- Context-aware preprocessing: A novel preprocessing pipeline integrating context summarization and metadata-based embeddings to enhance sarcasm detection accuracy and robustness.
- Comparative model analysis: A systematic evaluation of RoBERTa and DistilBERT, demonstrating 98.5% accuracy for RoBERTa and a 1.74x speedup with DistilBERT, highlighting trade-offs between accuracy and computational efficiency.
- Computational efficiency optimization: A detailed computational analysis proving that DistilBERT significantly reduces training time while maintaining competitive accuracy, making it ideal for real-time sarcasm detection in sentiment analysis applications.
2. Literature Review
2.1. Traditional Approaches: Rule-Based and Feature-Engineered Methods
2.2. Advances in Deep Learning for Sarcasm Detection
2.3. Context-Aware and Multi-Modal Approaches
3. Problem Statement and Research Gaps
Research Gaps
- Limited contextual understanding: Existing models lack multi-turn conversation modeling and effective context summarization techniques.
- High computational cost: Transformer-based sarcasm detection remains resource-intensive, limiting real-time applications.
- No systematic model comparison: Studies rarely evaluate RoBERTa vs. DistilBERT for accuracy vs. efficiency trade-offs.
- Lack of metadata utilization: Speaker details, sentiment shifts, and contextual embeddings remain underexplored in sarcasm classification.
- Underdeveloped multi-modal approaches: Most models focus only on text, ignoring audio and image-based sarcasm cues.
4. Dataset
- H is the entropy score (higher values indicate a more balanced dataset);
- is the proportion of samples in class i;
- n is the number of classes.
5. Proposed Methodology
5.1. Data Preprocessing
- Text cleaning: Removes noise such as special characters, punctuation, and stopwords. All text is standardized to lowercase, and lemmatization is applied to maintain uniformity.
- Contextual integration: Metadata, such as article descriptions, speaker details, and parent comments, are merged with the main text. Summarization techniques are used to condense lengthy conversations into concise representations.
- Balancing data: Oversampling and augmentation techniques address imbalanced class distributions in datasets.
- Tokenization: Utilizes tokenizers like RoBERTa or DistilBERT to split text into uniform input sequences with appropriate truncation and padding. Figure 2 shows the preprocessing examples from different datasets.
5.2. Feature Engineering
- Implicit emotions: Extracted from datasets like Mustard to highlight emotional undertones in dialogues.
- Speaker-specific information: Differentiates speakers in conversational datasets to enhance context awareness.
- Metadata utilization: Article sections, authors, and subreddits are encoded as auxiliary inputs for transformer models.
- represents the embedding of the word, W.
- is the hidden state at position i.
- represents the attention weight.
5.3. Data Augmentation
- Augmentation techniques
- Impact of data augmentation
5.4. Model Selection and Fine-Tuning
- RoBERTa: A robust model optimized for contextual understanding, fine-tuned on sarcasm detection tasks using datasets like News Headlines and Mustard.
- DistilBERT: A lightweight alternative to BERT, offering faster training with comparable accuracy. It is utilized for scenarios requiring reduced computational overhead.
- n is the sequence length (number of tokens).
- d is the model’s hidden dimension.
5.5. Transformer Self-Attention Mechanism
- Q = Query matrix;
- K = Key matrix;
- V = Value matrix;
- = Dimensionality of keys.
5.6. Context Summarization
- A lower compression ratio indicates a more compact summary with retained meaning.
5.7. Algorithm: BART-Based Context Summarization for Sarcasm Detection
- Multi-turn conversation dataset (e.g., Reddit, Mustard).
- Pre-trained BART model.
- Hyperparameters: Batch size (16), learning rate (2 × 10−5), Epochs (5).
- Summarized context representation of the conversation.
- Concatenated with original sarcastic comment for final classification.
- Step 1:
- Data preprocessing.
- Extract conversation threads from the dataset.
- Normalize text (lowercase, punctuation removal, tokenization).
- Structure input as a sequence of utterances from different speakers.
- Step 2:
- BART-based summarization.
- Encode input conversation using the BART tokenizer:
- Pass encoded input through the BART encoder–decoder model:
- Step 3:
- Fine-tuning BART on sarcasm-specific data.
- Train BART on sarcastic conversations using cross-entropy loss:
- Update model parameters using the AdamW optimizer.
- Step 4:
- Integrate summarized context with the sarcasm classification model.
- Concatenate with sarcastic comment:
- Pass to RoBERTa/DistilBERT for sarcasm classification.
- Predict sarcasm label:
- Step 5:
- Performance evaluation.
- Compute accuracy, F1 score, and the Jaccard coefficient for sarcasm classification.
- Evaluate the impact of summarization by comparing performance with and without BART-generated context.
5.8. Experimental Setup and Training
- Training configuration: Experiments were conducted using an 80-20 split for training and testing. Key parameters included batch size, learning rate, and weight decay.
- Evaluation metrics: Performance is measured using accuracy, precision, recall, and F1 score, ensuring a holistic assessment of the models. Following Equations (5)–(9), define accuracy, precision, recall, F1 score, and the Jaccard coefficient.
5.9. Validation
- News Headlines dataset: Incorporates metadata to improve sarcasm detection in formal communication.
- Mustard dataset: Evaluates the integration of emotional and conversational context.
- Reddit dataset: Tests the model’s ability to adapt to real-world, informal conversational data.
5.10. Comparative Analysis
6. Results and Discussion
6.1. Feature Importance Analysis for Sarcasm Detection
- = importance score of feature f;
- = weight assigned to feature f (e.g., metadata, lexical cues);
- = total weight sum of all features.
6.2. Computational Efficiency Analysis
- T is the trade-off score;
- A is the accuracy;
- S is the speedup factor;
- , are weight coefficients based on priority.
- is the training time of the RoBERTa model.
- is the training time of the DistilBERT model.
7. Model Optimization and Trade-Off Between False Positives and False
7.1. Performance Metrics and Model Optimization
- Accuracy: Measures overall correctness but can be misleading in imbalanced datasets.
- Precision: Ensures that sarcastic predictions are truly sarcastic.
- Recall: Ensures that all sarcastic instances are detected.
- F1 score: Balances precision and recall for a holistic evaluation.
- Jaccard coefficient: Evaluates similarity between predicted and actual sarcastic samples.
7.2. Trade-Off: False Positives vs. False Negatives
- Useful in sentiment analysis and opinion mining, where incorrectly tagging neutral statements as sarcastic can distort sentiment polarity.
- Helps in customer feedback analysis, ensuring neutral or positive reviews are not misclassified as sarcasm.
- Critical in social media moderation and automated content moderation, where missing sarcasm can lead to misinterpretation of toxic or harmful comments.
- Ensures comprehensive detection of sarcasm in chatbots and virtual assistants, reducing miscommunication in AI–human interactions.
7.3. Model Optimization Strategy
- For formal datasets (News Headlines);
- For conversational datasets (Reddit, Mustard).
- RoBERTa achieved 98.5% accuracy, with an F1 score of 99%, optimizing precision.
- DistilBERT, optimized for efficiency, achieved an F1 score of 97.5%, with improved recall in conversational data.
8. Limitations and Future Scope
- Implicit sarcasm challenges: The model struggles with sarcasm requiring external knowledge or cultural context.
- Computational trade-offs: RoBERTa is highly accurate but resource-intensive, while DistilBERT is faster but slightly less effective in complex sarcasm cases.
- Dataset constraints: Performance may vary across different text formats, cultures, and languages, limiting generalizability.
- Lack of multi-modal features: The model relies on text only, missing audio and visual cues crucial for sarcasm detection in voice or meme-based content.
- Enhanced context awareness: Integrating knowledge graphs and external sources to improve detection of implicit sarcasm.
- Optimized real-time models: Developing lighter transformer architectures for faster and more efficient sarcasm detection.
- Multi-language expansion: Extending sarcasm detection to low-resource languages and culturally adaptive models.
- Multi-modal integration: Incorporating audio (tone, speech) and visual (gestures, memes) cues for comprehensive sarcasm classification.
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ou, L.; Li, Z. Modeling inter-modal incongruous sentiment expressions for multi-modal sarcasm detection. Neurocomputing 2025, 616, 128874. [Google Scholar] [CrossRef]
- Jin, X.; Yang, Y.; Wu, Y.; Xu, Y. Research on Sarcasm Detection Technology based on Image-Text Fusion. Comput. Mater. Contin. Mater. Contin. (Print) 2024, 79, 5225–5242. [Google Scholar] [CrossRef]
- Nayak, D.K.; Bolla, B.K. Efficient deep learning methods for sarcasm detection of news headlines. In Machine Learning and Autonomous Systems: Proceedings ICMLAS 2021; Springer: Singapore, 2022; pp. 371–382. [Google Scholar]
- Savini, A.; Caragea, C. Intermediate-task transfer learning with BERT for sarcasm detection. Mathematics 2022, 10, 844. [Google Scholar] [CrossRef]
- Purnima, T.; Rao, C.K. Automated detection of offensive images and sarcastic memes in social media through NLP. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1415. [Google Scholar] [CrossRef]
- Kumari, G.; Adak, C.; Ekbal, A. MU2STS: A multitask multimodal Sarcasm-Humor-Differential Teacher-Student model for sarcastic meme detection. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2024; pp. 19–37. [Google Scholar]
- Zhang, Y. CFN: A complex-valued fuzzy network for sarcasm detection in conversations. IEEE Trans. Fuzzy Syst. 2021, 29, 3696–3710. [Google Scholar] [CrossRef]
- Pan, H. Modeling intra- and inter-modality incongruity for multi-modal sarcasm detection. In Findings of the Association for Computational Linguistics: EMNLP 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1383–1392. [Google Scholar]
- Sinha, S.; Yadav, V.K. Sarcasm Detection in News Headlines Using Deep Learning. In Proceedings of the 2023 International Conference on Recent Advances in Science and Engineering Technology (ICRASET), Bg Nagara, India, 23–24 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Vinoth, D.; Prabhavathy, P. An intelligent machine learning-based sarcasm detection and classification model on social networks. J. Supercomput. 2022, 78, 10575–10594. [Google Scholar] [CrossRef]
- Razali, M.S.; Halin, A.A.; Ye, L.; Doraisamy, S.; Norowi, N.M. Sarcasm detection using deep learning with contextual features. IEEE Access 2021, 9, 68609–68618. [Google Scholar] [CrossRef]
- Bhattacharjee, A.; Kumar, A.; Promod, D. A Comparative Analysis on Sarcasm Detection; Emerald Publishing Limited: Bingley, UK, 2023; pp. 436–441. [Google Scholar] [CrossRef]
- Zhang, Y. Stance-level Sarcasm Detection with BERT and Stance-centered Graph Attention Networks. ACM Trans. Internet Technol. 2022, 23, 1–21. [Google Scholar] [CrossRef]
- Khan, S.; Qasim, I.; Khan, W.; Aurangzeb, K.; Khan, J.A.; Anwar, M.S. A novel transformer attention-based approach for sarcasm detection. Expert Syst. 2024, 42, e13686. [Google Scholar] [CrossRef]
- Thaokar, C.; Rout, J.K.; Rout, M.; Ray, N.K. N-Gram based sarcasm detection for news and social media text using hybrid deep learning models. SN Comput. Sci. 2024, 5, 163. [Google Scholar] [CrossRef]
- Rajani, B.; Saxena, S.; Kumar, B.S.; Narang, G. Sarcasm detection and classification using deep learning model. In Lecture Notes in Networks and Systems; Springer: Singapore, 2024; pp. 387–398. [Google Scholar]
- Diao, Y.; Yang, L.; Li, S.; Hao, Z.; Fan, X.; Lin, H. Detect sarcasm and humor jointly by neural Multi-Task learning. IEEE Access 2024, 12, 38071–38080. [Google Scholar] [CrossRef]
- Baruah, A. Context-aware sarcasm detection using BERT. In Proceedings of the Second Workshop Figurative Language Processing, Online, 9 July 2020; pp. 83–87. [Google Scholar]
- Kavitha, K.; Chittieni, S. An Intelligent Metaheuristic Optimization with Deep Convolutional Recurrent Neural Network Enabled Sarcasm Detection and Classification Model. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 304–314. [Google Scholar] [CrossRef]
- Jaiswal, N. Neural sarcasm detection using conversation context. In Proceedings of the Second Workshop Figurative Language Processing, Online, 9 July 2020; pp. 77–82. [Google Scholar]
- Jayaraman, A.K. Sarcasm detection in news headlines using supervised learning. In Proceedings of the 2022 International Conference on Artificial Intelligence and Data Engineering (AIDE), Karkala, India, 22–23 December 2022; pp. 288–294. [Google Scholar]
- Abercrombie, G.; Hovy, D. Sarcasm in political discourse: A linguistic and sentiment analysis. Comput. Linguist 2017, 43, 755–770. [Google Scholar]
- GitHub. Kavitha-Kothandaraman/Sarcasm-Detection-NLP: To Build a Model to Detect Whether a Sentence Is Sarcastic or Not, Using Bidirectional LSTMs. 2023. Available online: https://github.com/Kavitha-Kothandaraman/Sarcasm-Detection-NLP (accessed on 24 February 2025).
- Khodak, M.; Saunshi, N.; Vodrahalli, K. A large Self-Annotated corpus for sarcasm. arXiv 2017, arXiv:1704.05579. [Google Scholar] [CrossRef]
- Liang, B.; Gui, L.; He, Y.; Cambria, E.; Xu, R. Fusion and Discrimination: A Multimodal graph contrastive learning framework for multimodal sarcasm detection. IEEE Trans. Affect. Comput. 2024, 15, 1874–1888. [Google Scholar] [CrossRef]
- Hassan, A.Q.A. Automated Sarcasm Recognition using Applied Linguistics driven Deep Learning with Large Language Model. Fractals 2024, 32, 2540031. [Google Scholar] [CrossRef]
- Pradhan, J.; Verma, R.; Kumar, S.; Sharma, V. An Efficient Sarcasm Detection using Linguistic Features and Ensemble Machine Learning. Procedia Comput. Sci. 2024, 235, 1058–1067. [Google Scholar] [CrossRef]
- Palaniammal, A.; Anandababu, P. Robust Sarcasm Detection using Artificial Rabbits Optimizer with Multilayer Convolutional Encoder-Decoder Neural Network on Social Media. Int. J. Electron. Commun. Eng. 2023, 10, 1–13. [Google Scholar]
- Rajani, B.; Saxena, S.; Kumar, B.S. Detection of sarcasm in tweets using hybrid machine learning method. J. Auton. Intell. 2024, 7, 1–12. [Google Scholar] [CrossRef]
- Băroiu, A.C.; Trăușan-Matu, Ș. Comparison of deep learning models for automatic Detection of sarcasm context on the MUSTARD dataset. Electronics 2023, 12, 666. [Google Scholar] [CrossRef]
Feature | News Headlines | Mustard | Reddit (SARC) |
---|---|---|---|
Total Records | 26,709 | 1202 | ~1,300,000 |
Sarcastic Sentences | Approximately 47% | Balanced per speaker | Not specified |
Non-Sarcastic Sentences | Approximately 53% | Balanced per speaker | Not specified |
Ratio (Sarcastic:Non-Sarcastic) | 47:53:00 | 50:50:00 | Unknown |
Sarcastic Avg. Length | 8 words | 12 words | 15 words |
Non-Sarcastic Avg. Length | 6 words | 10 words | 13 words |
Main Source/Section | Politics, business, entertainment | Friends, Big Bang Theory | r/sarcasm, r/funny, r/news |
Challenges | Imbalanced class distribution | Limited dataset size for deep learning | Requires significant computational resources |
Dataset | Sample Text/Dialogue | Sarcastic | Context/Metadata |
---|---|---|---|
News Headlines | “New study shows coffee improves productivity at work!” | No | Section: Health, Author: J. Doe |
News Headlines | “Economy soars while unemployment reaches new heights!” | Yes | Section: Business, Author: A. Smith |
Mustard | “Sheldon: Oh, I love when you ignore my genius ideas.” | Yes | Scene: Lab Discussion, Emotion: Sarcasm |
Mustard | “Penny: Thank you for fixing my car, you’re amazing!” | No | Scene: Garage, Emotion: Gratitude |
Reddit (SARC) | “Sure, because everyone’s life revolves around this post.” | Yes | Subreddit: r/sarcasm, Parent: General Topic |
Reddit (SARC) | “Thanks for the advice, really helpful.” | No | Subreddit: r/advice, Parent: Help Request |
Model | Learning Rate | Batch Size | Epochs | Accuracy (%) | F1 Score (%) |
---|---|---|---|---|---|
RoBERTa | 2 × 10−5 | 16 | 10 | 98.5 | 99 |
DistilBERT | 2 × 10−5 | 16 | 10 | 96.2 | 97.5 |
Model | Training Time (h) | Inference Time (ms/Sample) | GPU Memory Usage (GB) |
---|---|---|---|
RoBERTa | 7.3 | 45 | 16 |
DistilBERT | 4.2 | 25 | 10 |
Dataset | Condition | Accuracy (%) | F1 Score (%) | Jaccard Coefficient (%) |
---|---|---|---|---|
News Headlines | With Metadata | 98.5 | 99 | 95.3 |
News Headlines | Without Metadata | 93.2 | 91.5 | 89.2 |
News Headlines | With Context Summarization | 97.8 | 98.3 | 93.7 |
Mustard | With Speaker Context | 89.2 | 90 | 88.5 |
Mustard | Without Speaker Context | 82.4 | 81.7 | 80.5 |
Mustard | With Context Summarization | 88.7 | 88 | 86.3 |
Reddit (SARC) | With Parent–Child Context | 74.8 | 75 | 72.3 |
Reddit (SARC) | Without Parent–Child Context | 67.3 | 68.2 | 65.8 |
Reddit (SARC) | With Context Summarization | 72.5 | 70.5 | 69.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dubey, P.; Dubey, P.; Bokoro, P.N. Unpacking Sarcasm: A Contextual and Transformer-Based Approach for Improved Detection. Computers 2025, 14, 95. https://doi.org/10.3390/computers14030095
Dubey P, Dubey P, Bokoro PN. Unpacking Sarcasm: A Contextual and Transformer-Based Approach for Improved Detection. Computers. 2025; 14(3):95. https://doi.org/10.3390/computers14030095
Chicago/Turabian StyleDubey, Parul, Pushkar Dubey, and Pitshou N. Bokoro. 2025. "Unpacking Sarcasm: A Contextual and Transformer-Based Approach for Improved Detection" Computers 14, no. 3: 95. https://doi.org/10.3390/computers14030095
APA StyleDubey, P., Dubey, P., & Bokoro, P. N. (2025). Unpacking Sarcasm: A Contextual and Transformer-Based Approach for Improved Detection. Computers, 14(3), 95. https://doi.org/10.3390/computers14030095