Hate Speech Detection

This project is for the final project of Natural Language Processing course. The goal of this project is to show that the models trained on combined dataset can give us better generalization.

Special thanks to my teammates Ethan Oh and Fernando Vera Buschmann for their valuable contribution to this project.

we selected three datasets: AHSD, MHS and HATEX. For our experiment, we implement two models: BERT and BERTweet.

We use the pretrained BERT and BERTweet model from Hugging Face Transformers library and fine tune them on three different datasets: AHSD, MHS and HATEX individually. Then in separate experiments, we fine tune these two models on the combined dataset consisting of these three datasets. For this project, we experiment for both binary classification and multiclass classification. Therefore, for binary classification, we fine tune four models: three for individual datasets and one for the combined dataset. Furthermore, for multiclass classification, we fine tune four more models. Here, our goal is to establish a statement that each model fine tuned on the combined dataset provides better evaluation scores on the individual datasets compared to the models fine tuned on the individual datasets. We used one GPU for the experiment and it took around 120 hours to fine tune these eight models. The batch size we use in our experiment is 32 and we ran all the experiments for 50 epochs.

From each experiment, we saved the best models while training for 50 epochs and used them to evaluate on the test datasets. We assess all the fine tuned BERT and BERTweet models using relevant evaluation metrics including precision, recall, F1 score, and accuracy.

How to Run

run .py files of BERT and BERTweet for both binary and multiclass with 4 datasets: AHSD, MHS, HATEX and Merged datasets. It will save the best models.
run 4 .py files on evaluatation of BERT and BERTweet for both binary and multiclass.

You will find more details about this work here when it will be publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
AHSD_binaryclass.csv		AHSD_binaryclass.csv
AHSD_multiclass.csv		AHSD_multiclass.csv
HATEX_binaryclass.csv		HATEX_binaryclass.csv
HATEX_multiclass.csv		HATEX_multiclass.csv
MHS_binaryclass.csv		MHS_binaryclass.csv
MHS_multiclass.csv		MHS_multiclass.csv
Merged_binaryclass.csv		Merged_binaryclass.csv
Merged_multiclass.csv		Merged_multiclass.csv
README.md		README.md
bert_AHSD_binary.py		bert_AHSD_binary.py
bert_AHSD_multiclass.py		bert_AHSD_multiclass.py
bert_HATEX_binary.py		bert_HATEX_binary.py
bert_HATEX_multiclass.py		bert_HATEX_multiclass.py
bert_MHS_binary.py		bert_MHS_binary.py
bert_MHS_multiclass.py		bert_MHS_multiclass.py
bert_Merged_binary.py		bert_Merged_binary.py
bert_Merged_multiclass.py		bert_Merged_multiclass.py
bertweet_AHSD_binary.py		bertweet_AHSD_binary.py
bertweet_AHSD_multiclass.py		bertweet_AHSD_multiclass.py
bertweet_HATEX_binary.py		bertweet_HATEX_binary.py
bertweet_HATEX_multiclass.py		bertweet_HATEX_multiclass.py
bertweet_MHS_binary.py		bertweet_MHS_binary.py
bertweet_MHS_multiclass.py		bertweet_MHS_multiclass.py
bertweet_Merged_binary.py		bertweet_Merged_binary.py
bertweet_Merged_multiclass.py		bertweet_Merged_multiclass.py
evaluation_bert.py		evaluation_bert.py
evaluation_bert_multiclass.py		evaluation_bert_multiclass.py
evaluation_bertweet.py		evaluation_bertweet.py
evaluation_bertweet_multiclass.py		evaluation_bertweet_multiclass.py
merge3into1.py		merge3into1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hate Speech Detection

How to Run

About

Uh oh!

Releases

Packages

Languages

alshahriarrubel/HateSpeechDetect

Folders and files

Latest commit

History

Repository files navigation

Hate Speech Detection

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages