Eron Ariodito Hermanto (ariodito@kth.se) Pierre Grégoire Malo Jousselin (pgmjo@kth.se)
This project is part of Lab 1 for the course ID2223 - HT2024. The goal is to build a serverless AI system that predicts air quality levels (PM2.5) in Södermalm Island, Stockholm, Sweden using historical and forecasted weather data. The system includes data pipelines, a machine learning model, and a Dashboard for visualization. 4 (active) sensors is used troughout the island of Södermalm for this project.
This lab implements:
- Feature Pipelines: Collect and preprocess air quality and weather data.
- Training Pipeline: Train a machine learning model to predict PM2.5 levels.
- Batch Inference Pipeline: Generate predictions and update a public dashboard.
- Dashboard: A web-based visualization of predictions utilizing different database and React frontend.
To complete the lab, ensure you have:
- Accounts:
- Python Environment:
- Use Conda or venv to set up a virtual environment:
pip install -r requirements.txt
- Use Conda or venv to set up a virtual environment:
- Data Sources:
- Air quality data: AQICN.
- Weather data: Open-Meteo.
GitHub action is used to schedule the run to collect current data and make another batch inference. It will also upload the prediction data to a database that then is routed to the dashboard for visualization.
- Backfill Pipeline: Loads historical data (1+ year) and registers it in Hopsworks feature groups.
- Daily Pipeline: Fetches daily air quality and weather data and updates the feature groups in Hopsworks.
- Combines data from the
air_quality
andweather
feature groups. - Trains a regression model using XGBoost.
- Registers the trained model in Hopsworks for future use.
- Retrieves the trained model from Hopsworks.
- Predicts air quality levels for the next 7–10 days.
- Generates a visualization of predictions.
- Displays PM2.5 predictions and historical model performance (hindcast graphs).