8000 GitHub - akhil-gun/DSI_M3_NLP
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

akhil-gun/DSI_M3_NLP

Repository files navigation

Team members: Nmeso, Mekondjo, Lali, Akhil

Goal

In module 3 of the DSI, the purpose is the explore and appy Natural Language Processing (NLP). After each team member had come up with at least one idea, it was decided that we would explore and implement a sarcasm detector.

Background

The Cambridge English dictionary defines sarcasm as "the use of remarks that clearly mean the opposite of what they say, made in order to hurt someone's feelings or to criticize something in a humorous way" [1]. The Merriam-Webster dictionary defines it as "a sharp and often satirical or ironic utterance designed to cut or give pain" [2]. Not everybody would agree about these definitions, but sarcasm is usually when positive words are used to convey a negative message. Naturally, it differs from person to person and is highly dependent on the culture, gender and many other aspects.

Motivation

Especially for beginner learners of any language, identifying sarcasm can remains challenging. Things can be lost in translation, and people can feel hurt unintentionally. That is why the purpose of a sarcasm detector would help people understand when something is sarcastic and not take it the wrong way. This is why as a baseline, it was decided that we would focus on detecting sarcasm on news headlines as this is a form of widely consumed media. Furthermore, this might be especially applicable in social media circumstances such as on Twitter and Facebook. In the future this could be useful would discriminating between harmful content and witty sentences.

Datasets

The data used for the project was taken from kaggle. There were two JSON files ecah containg the 'is_sarcatic' and headlines columns. The two files were joined to create a bigger data for the analysis. The complete date has 50,000 training examples with the data not significantly imbalanced. The data can be found here.

Methodology

Project flow chart

Future works

  • More datasets
  • Expand to non-headlines
  • Hyperparameter fine tuning
  • Ensemble model
  • Expand to audio
  • Multi-channel NLP

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0