The main purpose is to have some practical exercise with python programming, hadoop, hive, and spark.
Data Source:
# export requirements.txt
pip freeze > requirements.txt
# import requirements.txt
pip install -r requirements.txt
This project plans to:
- fetch data from weather api provided by VisualCrossing
- save it in mysql or csv/json files in hdfs
- load data sources in hive
- load data with pyspark
- data pre-preparation: clean pipeline
- make prediction
- create visualization with tableau