ParlaMind is a project that analyzes speeches from the German Bundestag from 1949 to 2025. It applies advanced Natural Language Processing (NLP) techniques to extract insights from political discourse, including sentiment analysis, topic modeling, and party classification using BERT.
- Sentiment Analysis: Determines the sentiment (positive, neutral, or negative) of speeches.
- Topic Modeling: Identifies key themes and trends over time.
- Party Classification: Uses BERT to classify and attribute speeches to specific parties.
- Historical Insights: Tracks linguistic and ideological changes across decades.
- Python (Core language)
- PyTorch & Hugging Face Transformers (For BERT-based NLP tasks)
- Polars & Pandas (For data processing)
- Poetry (For dependency management)
poetry install
To analyze speeches, run:
poetry run python main.py
For it to run you need to have the speeches.csv and factions.csv in /data/raw/OpenDiscourse/ from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FIKIBO after that the XML files will be downloaded and turned into parquet file/polars df. For the XML download you have to create a .secrets.toml with api_key = "your_api_key" from bundestag api. You can get the newest api from https://dip.bundestag.de/%C3%BCber-dip/hilfe/api. After that you can finde the ParlaMind.parquet in /data/formated/parquet/.
The dataset consists of Bundestag speeches from 1949–2025, preprocessed and stored in parquet format.