Overview
This repository focuses on data wrangling techniques applied to a Diabetes Prediction Machine Learning project. It is organized into two main parts:
- CSV Data Processing: Comprehensive handling of data in CSV format, including exploratory data analysis (EDA), data cleaning, transformation, and analysis.
- JSON Data Handling: A Python script for data extraction, cleaning, and transformation from JSON format, utilizing Object-Oriented Programming (OOP) principles.
This part of the project handles the CSV data used for diabetes prediction, covering:
a. Exploratory Data Analysis (EDA):
- Analyzed dataset characteristics.
- Generated summary statistics and visualizations.
- Identified patterns, trends, and outliers.
b. Data Cleaning:
- Managed missing values and outliers.
- Corrected data inconsistencies and removed duplicates.
- Filtered out irrelevant features.
c. Data Transformation:
- Applied normalization and scaling techniques.
- Encoded categorical variables.
- Engineered new features for improved model performance.
d. Data Analysis:
- Conducted statistical analyses and hypothesis testing.
- Evaluated the influence of features on diabetes prediction.
This part provides a Python script for working with JSON data, featuring:
a. Data Extraction:
- Reads and parses JSON data files.
b. Data Cleaning:
- Cleans JSON data using custom methods.
c. Data Transformation:
- Converts data into a format suitable for further analysis or model training.
d. OOP Concepts:
- Uses classes and objects to manage data processing tasks.
e. Logging and Exception Handling:
- Implements logging for execution tracking and debugging.
- Includes exception handling to address potential errors.
Thank you!