8000 GitHub - felixfritzen/mlfs-book: O'Reilly book - Building Machine Learning Systems with a feature store: batch, real-time, and LLMs
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

O'Reilly book - Building Machine Learning Systems with a feature store: batch, real-time, and LLMs

License

Notifications You must be signed in to change notification settings

felixfritzen/mlfs-book

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlfs-book

O'Reilly book - Building Machine Learning Systems with a feature store: batch, real-time, and LLMs

ML System Examples

Dashboards for Example ML Systems

Course Comparison

| Course | MLOps | LLLMs | Feature/Training/Inference | Working AI Systems | Focus | |--------------------------------|-------|----------------------------|--------------------|------------------| | Building AI Systems (O'Reilly) | Yes | Fine-Tuning & RAG | Yes | High | Project-based, Software Engineering, Fundamentals | | Made With ML | No | Yes | No | No | Software Engineering, Model Training | | 7 Steps MLOps | Yes | Separate Course | Yes | Low | Learning Tools and Project |

Predict Air Quality

This project builds an Air Quality Forecasting Service for an Air Quality sensor available at (https://api.waqi.info/feed/A58912). We gather historic data for the air quality (pm 25) for an extensive time period. We do the same for variables that map to weather parameters at the same location for the same time span. Both of the datasets of historic data are loaded to feature groups through Hopsworks, a feature store platform.

The pipeline then run daily to collect values at the specific location for both pm 25 and weather variables. The two feature groups are updated with this new datapoint while maintaining the data from previous days.

We train a decision tree model, more specifically the XGB Regressor, to predict future pm 25 values. The pm 25 value is the target value and we perform supervised learning with the labels of the historic pm 25 values. The features that we use to predict the pm 25 value are several weather variables (wind speed, direction, rain and temp). We extend our model to also include the mean of the pm 25 value from the previous 3 days. When analyzing the feature importance it is evident that this means holds the most predictive power. Through the implementation of the mean as a new feature we manage to lower the mse from 5.337 to 5.311.

We set our pipeline to run once a day where we collect the daily data and then run our trained model on this new data. Note that we don't retrain our model each day. We plot the prediction of the pm 25 values for the co 5624 ming 9 days in teh plot below

Air quality Prediction

We also compare our predicted values for the pm 25 value for the previous days with the actual values to be able to evaluate our model. This is shown below

Personalized Air Quality with LLMs Architecture

About

O'Reilly book - Building Machine Learning Systems with a feature store: batch, real-time, and LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.6%
  • Python 1.4%
0