8000 GitHub - Hamagistral/Azure-AW: πŸ”§ Azure Data Engineering Project (On-premise db to the cloud)
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Hamagistral/Azure-AW

Repository files navigation

Banner

    πŸ”§ Analyzing Sales of AdventureWorks πŸ”Œ

On-prem DB to Azure Cloud Pipeline with Data Factory, Lake Storage, Spark, Databricks, Synapse, PowerBI


πŸ“ Table of Contents

  1. Project Overview
  2. Key Insights
  3. Project Architecture
    3.1. Data Ingestion
    3.2. Data Transformation
    3.3. Data Loading
    3.4. Data Reporting
  4. Credits
  5. Contact

πŸ”¬ Project Overview

This an end-to-end data engineering project on the Azure cloud. Where I did data ingestion from a on-premise SQL Server to Azure Data Lake using Data Factory to transformation using Databricks and Spark, loading to Synapse, and reporting using PowerBI. Also, I used Azure Active Directory (AAD) and Azure Key Vault for the data monitoring and governance purpose.

πŸ’Ύ Dataset

AdventureWorks is a database provided by Microsoft for free on online platforms. It is a product sample database originally published by Microsoft to demonstrate the supposed design of a SQL server database using SQL server 2008. Here are some key points to know about AdventureWorks:

  • AdventureWorks database supports a manufacturing MNC named Adventure Works Cycles.
  • It is a sample Online Transaction Processing (or OLTP) database, which is a type of data processing where multiple transactions occur concurrently. These are shipped by Microsoft with all of their SQL server products.

For this project I used the Lightweight (LT) data: a lightweight and pared down version of the OLTP sample. Download here

🎯 Project Goals

  • Establish a connection between on-premise SQL server and Azure cloud.
  • Ingest tables into the Azure Data Lake.
  • Apply data cleaning and transformation using Azure Databricks.
  • Utilize Azure Synapse Analytics for loading clean data.
  • Create interactive data visualizations and reports with Microsoft Power BI.
  • Implement Azure Active Directory (AAD) and Azure Key Vault for monitoring and governance.

πŸ•΅οΈ Key Insights

  • πŸ’Έ Total Revenue by Product Category

    • Touring Bikes is the top 1 category generating revenue with 32% followed by Road Bikes with 26% and Mountain Bikes with 24%.
  • 🌍 Sales by Country

    • NΒ°1: The United Kingdom (UK) have the most total sales with 278 and $572,000 of total revenue.
    • NΒ°2: The United States of America (USA) is second with total sales with 264 and $383,810 of total revenue.
  • 🚻 Revenue by Gender

    • 81% of the revenue is generated by Male customers
    • 19% of the revenue is generated by Female customers

This can be explained by males have more interest in doing outdoor activites with the different categories of Bikes than females.

πŸ“ Project Architecture

You can find the detailed information on the diagram below:

AzurePipeline-Hamagistral

πŸ“€ Data Ingestion

  • Connected the on-premise SQL Server with Azure using Microsoft Integration Runtime.

image

  • Setup the Resource group with needed services (Key Vault, Storage Account, Data Factory, Databricks, Synapse Analytics)

ressource-group

  • Migrated the tables from on-premise SQL Server to Azure Data Lake Storage Gen2.

image df-pipeline

βš™οΈ Data Transformation

  • Mounted Azure Blob Storage to Databricks to retrieve raw data from the Data Lake.
  • Used Spark Cluster in Azure Databricks to clean and refine the raw data.
  • Saved the cleaned data in a Delta format; optimized for further analysis.

image

πŸ“₯ Data Loading

  • Used Azure Synapse Analytics to load the refined data efficiently.
  • Created SQL database and connected it to the data lake.

synapse-pipeline db-synapse

πŸ“Š Data Reporting

  • Connected Microsoft Power BI to Azure Synapse, and used the Views of the DB to create interactive and insightful data visualizations.

PowerBI-dashboard

πŸ› οΈ Technologies Used

  • Data Source: SQL Server
  • Orchestration: Azure Data Factory
  • Ingestion: Azure Data Lake Gen2
  • Storage: Azure Synapse Analytics
  • Authentication and Secrets Management: Azure Active Directory and Azure Key Vault
  • Data Visualization: PowerBI

πŸ“‹ Credits

πŸ“¨ Contact Me

LinkedIn β€’ Website β€’ Gmail

About

πŸ”§ Azure Data Engineering Project (On-premise db to the cloud)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0