A set of simple utilities for cleaning up data frame..
sudo pip install git+git://github.com/sketchytechky/kleaner.git
import pandas as pd
from kleaner.kleaner import Kleaner
df = pd.read_csv('kaggle.csv')
kdf = Kleaner(df)
# get the healthiness of the kaggle.csv file
kdf.healthiness()
-
Completeness - Referring to missing key information
- % of nulll values of a column
-
Consistency - Referring to single representation of data
- % of diversity of value
- TimeSeries with Anomoly Detection would be great for DQ Stats