Topic Modeling for Grouped Data

Overview

This project uses advanced natural language processing techniques to analyze and compare textual data across different segments of a dataset. It consists of three main scripts that execute Latent Dirichlet Allocation (LDA) topic modeling, calculate cosine similarities between topic distributions of different subsets, and apply author modeling to explore how topics are distributed across the full dataset. This project is particularly useful for understanding thematic structures in large text corpora and discovering both unique and shared topics across different data segments.

Project Structure

LDA Topic Modeling: Performs topic modeling on the entire dataset to identify and characterize major themes.
Cosine Similarity Analysis: Computes the similarity between topic distributions from two distinct subsets of the dataset, providing insights into their thematic overlap.
Segment-Based Author Modeling: Analyzes how different data segments contribute to the overall topic distributions, providing a unique perspective on the influence and specificity of each segment.

Files

LDA.py - Script for performing LDA topic modeling on the entire dataset.
LDA-multi.py - Script for calculating cosine similarities between the topic distributions of two data subsets.
author-tm.py - Script for implementing author modeling to assess topic distributions across the dataset.
cleaned-categories.xlsx - Test data to demonstrate format of input.
requiements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LDA-multi.py		LDA-multi.py
LDA.py		LDA.py
README.md		README.md
Topic_Modeling_Visualization_Explanation.html		Topic_Modeling_Visualization_Explanation.html
author-tm.py		author-tm.py
cleaned-categories.xlsx		cleaned-categories.xlsx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic Modeling for Grouped Data

Overview

Project Structure

Files

About

Uh oh!

Releases

Packages

Uh oh!

Languages

emilyhasson/Topic-Modeling

Folders and files

Latest commit

History

Repository files navigation

Topic Modeling for Grouped Data

Overview

Project Structure

Files

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages