Census Data Analysis of USA

Exploratory Data Analysis (EDA) on US Census Data. Investigating income distribution, demographics, and correlations using Python, Pandas, and statistical tests.

Overview

This project aims at exploratory analysis of US data, with the main objective of identifying trends and related patterns among the distribution of income, in correlation with given demographic data. The dataset contains necessary information on age, employment, education level, marital status, place of birth, hours worked, gender, race, state, and of course, income.

Dataset

Census_Data.csv: Raw census data with all the numeric-coded attributes.
Attribute_Values.csv: Mapping of numeric attribute values to meaningful categories.

Key Analysis Steps

The project includes the following analyses:

Income Distribution Analysis
- Histogram of income values
- Log-transformed income distribution
- Zipf plot and cumulative frequency analysis
Demographic Correlations with Work Category
- Chi-square tests to analyze relationships between gender, race, place of birth, and employment category.
- Visualization using heatmaps of all the contingencies.
Impact and influence of demographic data on income prices
- Comparisons of average incomes between gender, race and place of birth.
- T-tests to identify significant differences.
Correlations of income with continuous variables
- Scatter plots for education, age and hours worked in relation to income.

Additional statistical tests based on new hypotheses formed from exploratory analysis.

Technologies Used

Python
Pandas for data processing
Seaborn & Matplotlib for data visualization

How to Run the Notebook

To execute the analysis, follow these steps:

Clone the repository:
- git clone https://github.com/PolyzosFotios/us-census-data-analysis.git
- cd us-census-data-analysis
Ιnstall the required dependencies:
- pip install -r requirements.txt
Start Jupyter Notebook:
- jupyter notebook
Open and run notebook.ipynb.

License

This project is available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Census Data Analysis of USA

Overview

Dataset

Key Analysis Steps

Technologies Used

How to Run the Notebook

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

PolyzosFotios/us-census-data-analysis

Folders and files

Latest commit

History

Repository files navigation

Census Data Analysis of USA

Overview

Dataset

Key Analysis Steps

Technologies Used

How to Run the Notebook

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages