This repository contains code for analyzing TripAdvisor reviews of hotels in Sri Lanka. The goal is to perform sentiment analysis and clustering to extract insights from the reviews. The analysis covers various stages, including data cleaning, sentiment classification, feature extraction, and text clustering.
This was done as part of a coursework for the module CM 4603 - Language Processing and Information Retrieval
- Data Collection: Reviews were extracted from TripAdvisor for 205 hotels in Sri Lanka.
- Data Cleaning: The dataset was preprocessed for sentiment analysis and clustering tasks.
- Establishing Ground Truth:
- Feature Extraction:
- Text classification(Sentiment Analysis): Reviews were classified based on sentiment.
- Clustering & Topic Modeling: Reviews were clustered based on hotel aspects.