10000 GitHub - sakibb019/Birthday-Analysis: We immediately see that male births outnumber female births in every decade. To see this trend a bit more clearly, we can use the built-in plotting tools in Pandas to visualize the total number of births by year
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

We immediately see that male births outnumber female births in every decade. To see this trend a bit more clearly, we can use the built-in plotting tools in Pandas to visualize the total number of births by year

License

Notifications You must be signed in to change notification settings

sakibb019/Birthday-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Birthday-Analysis

Let’s take a look at the freely available data on births in the United States, provided by the Centers for Disease Control (CDC). This data can be found at births.csv

import pandas as pd
births = pd.read_csv("births.csv") 
print(births.head()) births['day'].fillna(0, inplace=True) 
births['day'] = births['day'].astype(int)

1

births['decade'] = 10 * (births['year'] // 10)
births.pivot_table('births', index='decade', columns='gender', aggfunc='sum')
print(births.head())

1

We immediately see that male births outnumber female births in every decade. To see this trend a bit more clearly, we can use the built-in plotting tools in Pandas to visualize the total number of births by year :

import seaborn as sns 
sns.set() 
birth_decade = births.pivot_table('births', index='decade', columns='gender', aggfunc='sum') 
birth_decade.plot() 
plt.ylabel("Total births per year") 
plt.show()

1

Further data exploration:

There are a few interesting features we can pull out of this dataset using the Pandas tools. We must start by cleaning the data a bit, removing outliers caused by mistyped dates or missing values. One easy way to remove these all at once is to cut outliers, we’ll do this via a robust sigma-clipping operation:

import numpy as np quartiles = np.percentile(births['births'], [25, 50, 75]) mean = quartiles[1] sigma = 0.74 * (quartiles[2] - quartiles[0])

This final line is a robust estimate of the sample mean, where the 0.74 comes from the interquartile range of a Gaussian distribution. With this we can use the query() method to filter out rows with births outside these values:

births = births.query('(births > @mean - 5 * @sigma) & (births < @mean + 5 * @sigma)') births.index = pd.to_datetime(10000 * births.year + 100 * births.month + births.day, format='%Y%m%d') births['day of week'] = births.index.dayofweek

Using this we can plot births by weekday for several decades:

births_day = births.pivot_table('births', index='day of week', columns='decade', aggfunc='mean') births_day.index = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun'] births_day.plot() plt.ylabel("Average Births by Day") plt.show()

2

Apparently births are slightly less common on weekends than on weekdays! Note that the 1990s and 2000s are missing because the CDC data contains only the month of birth starting in 1989.

Another interesting view is to plot the mean number of births by the day of the year. Let’s first group the data by month and day separately:

births_month = births.pivot_table('births', [births.index.month, births.index.day]) print(births_month.head()) births_month.index = [pd.datetime(2012, month, day) for (month, day) in births_month.index] print(births_month.head())

1

Focusing on the month and day only, we now have a time series reflecting the average number of births by date of the year. From this, we can use the plot method to plot the data. It reveals some interesting trends:

fig, ax = plt.subplots(figsize=(12, 4)) births_month.plot(ax=ax) plt.show()

2

About

We immediately see that male births outnumber female births in every decade. To see this trend a bit more clearly, we can use the built-in plotting tools in Pandas to visualize the total number of births by year

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0