BioInsight is a data analytics and machine learning platform designed to analyze, visualize, and predict bioprocess outcomes. This project focuses on analyzing bioprocess data across different scales (1 mL and 30 L) to predict key performance indicators such as Final OD and GFPuv production.
- Data Processing & Visualization: Intuitive interface for exploring process data with interactive time series visualization
- Feature Selection: Advanced analysis of important features with correlation matrices and distribution plots
- Model Results: Comparison of multiple machine learning models (Random Forest, XGBoost, SVR, PLS) with performance metrics
- Interactive Predictions: Make real-time predictions by adjusting feature values
- PLS Component Analysis: Visualize PLS components and explained variance
- Python: Core programming language
- Streamlit: Interactive dashboard framework
- Scikit-learn: Machine learning model development
- XGBoost: Gradient boosting implementation
- Plotly: Interactive data visualization
- Pandas/NumPy: Data manipulation and numerical operations
├── Wrangled_Combined_Batch_Dataset.xlsx # Main dataset
├── models/ # Trained model files
├── src/
│ ├── dashboard/
│ │ └── app.py # Streamlit dashboard application
│ └── model_development.py # Model training pipeline
└── README.md # Project documentation
- Python 3.8+
- pip package manager
- Clone this repository
git clone https://github.com/adityachitlangia/BioInsight.git
cd BioInsight
- Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install required packages
pip install -r requirements.txt
To launch the BioInsight dashboard:
cd src/dashboard
streamlit run app.py
The dashboard will be accessible in your web browser at http://localhost:8501
To train or retrain the machine learning models:
python src/model_development.py
- View and analyze raw process data
- Visualize missing values
- Explore time series data by batch
- Examine feature importance rankings
- Analyze feature correlations
- Explore feature distributions with statistical insights
- Compare model performance (RMSE, R² score)
- Visualize actual vs predicted values
- Analyze PLS components and explained variance
- Make interactive predictions
- Enhanced visualization of PLS components with improved error handling
- Enhanced UI/UX:
- Improved dashboard styling with custom CSS
- Added card-like structures for better content organization
- Enhanced visual hierarchy and navigation
- Responsive design improvements
- Improved Error Handling:
- Better handling of data file paths
- Robust PLS component visualization
- Enhanced time series analysis functionality
- Improved batch selection and feature filtering
- Aditya Chitlangia