Personality Prediction Using ML
This project aims to predict the personality type of individuals based on various psychological traits using machine learning techniques. The dataset includes several attributes related to personality traits, and the goal is to classify the personality type.
The dataset consists of the following columns:
Gender
: Gender of the individual (Female/Male)Age
: Age of the individualopenness
: Openness to experience scoreneuroticism
: Neuroticism scoreconscientiousness
: Conscientiousness scoreagreeableness
: Agreeableness scoreextraversion
: Extraversion scorePersonality
: Personality type (target variable)
The train dataset has 709 rows, and the test dataset also has 709 rows, making a total of 1418 rows after concatenation.
-
Concatenation:
- Combined train and test datasets using
pd.concat([train, test], axis=0)
.
- Combined train and test datasets using
-
Basic Information:
- Explored the dataset to find the number of null values, shape, and descriptive statistics using
df.info()
,df.shape
, anddf.describe()
.
- Explored the dataset to find the number of null values, shape, and descriptive statistics using
-
Value Counts:
- Analyzed the distribution of the
Personality
column usingdf['Personality'].value_counts()
.
- Analyzed the distribution of the
-
Data Visualization:
- Visualized data using countplots, barplots, and histograms to study the relationships between columns.
-
Gender Encoding:
- Converted the categorical
Gender
column to numerical usingdf["Gender"] = df['Gender'].map({"Female": 0, "Male": 1})
.
- Converted the categorical
-
Correlation Analysis:
- Found the correlation of the
Personality
column with other features in a sorted manner usingdf.corr()["Personality"].sort_values()
. - Plotted a heatmap to visualize the correlations.
- Found the correlation of the
-
Train-Test Split:
- Split the data into training and testing sets.
-
Data Scaling:
- Scaled the data using
MinMaxScaler
fromsklearn
.
- Scaled the data using
-
Linear Regression:
- Trained a model using Linear Regression.
-
Gaussian Naive Bayes (GaussianNB):
- Trained a model using GaussianNB.
-
Random Forest Classifier:
- Trained a model using Random Forest Classifier.
The models were evaluated using the accuracy score.
- Linear Regression: Achieved an accuracy score of 0.0395.
- Gaussian Naive Bayes: Achieved an accuracy score of 0.3725.
- Random Forest Classifier: Achieved an accuracy score of 0.3529.
The low accuracy scores may be attributed to the following reasons:
- Imbalanced Dataset: The dataset may have an imbalanced distribution of personality types, leading to biased model performance.
- Feature Relevance: The selected features may not be strong predictors of personality, resulting in poor model performance.
- Model Choice: Linear Regression is generally not suitable for classification tasks. Although GaussianNB and Random Forest are more appropriate, they may still struggle with the given feature set and data characteristics.
To clone the repository, use the following command:
git clone https://github.com/yourusername/personality-prediction.git
cd personality-prediction