Efficient data management is essential for modern organizations, and the Fabric Metadata-Driven Framework (FMD) offers a state-of-the-art solution to streamline data operations. This framework leverages the advanced capabilities of the Fabric SQL Database to establish a robust, scalable, and metadata-driven architecture.
Fabric Metadata-Driven Framework is built on dynamic data pipelines and parameterized notebooks, fully configured through the Fabric SQL Database. It works seamlessly out of the box, requiring no initial modifications. Organizations can easily extend and scale the framework to meet evolving data needs, ensuring flexibility and adaptability for future growth.
-Enhanced Data Governance: Fabric Metadata-Driven Framework ensures comprehensive data governance by maintaining detailed metadata, enabling better data quality, consistency, and compliance.
-Scalability and Flexibility: The framework is designed to scale seamlessly with your organization's growth, adapting to evolving data needs without compromising performance.
-Streamlined Data Integration: The framework simplifies the integration of diverse data sources, providing a unified view of your data landscape.
-Cost Efficiency: By optimizing data processes and reducing redundancy, Fabric Metadata-Driven Framework helps organizations achieve significant cost savings.
Import Taskflow
If you want to have above taskflow within your workspace, navigate to right, click on import from Computer. Import the FMD_FABRIC_TASKFLOW.json from the Taskflow folder. Next assign the correct Artifact to the Taskflows(see screenshot in the same folder)
The Framework deploys the following default Workspace Architecture to ensure a clear separation of data, code, and orchestration for enhanced security. This structure is designed to restrict access based on roles, ensuring that individuals who require access to data do not necessarily have access to code or orchestration components.
- Data Workspaces
Dedicated workspaces for managing and storing data:
- Data Landing Zone
- Bronze Layer
- Silver Layer
- Code Workspaces
Workspaces for managing code and development artifacts:
- Data Pipelines
- Notebooks
- Spark Environments
- Orchestration and Logging Workspaces
Workspaces for orchestration and monitoring:
- Fabric SQL Database
- Semantic Model for Auditing and Logging (Work in Progress)
If a Gold Layer is added, it is advisable to create a separate workspace for reports. This ensures that users who need access to reports do not require access to the data workspace. This recommendation is based on practical experience from recent customer implementations.
The Framework implements a structured approach to data organization using the Medallion Architecture. This architecture is supported by the deployment of Lakehouses for the Data Landing Zone, Bronze Layer, and Silver Layer. All data pipelines and notebooks are orchestrated and executed based on this architecture.
- Handles both structured and unstructured data.
- Supports incremental data loads.
- Stores raw data "as-is" in a datetime-based folder structure.
- No schema is enforced at this stage.
- Deduplicates data.
- Adds data types for better structure.
- Data may still be inconsistent.
- Primarily serves as a copy of the source data.
- Schema is applied.
- Maintains historical data.
- Enforces data quality rules and performs data cleansing.
- Stores validated data.
- Does not include business-specific models or data.
To begin using the FMD Framework, refer to the deployment guide:
-
Data Model Details: Learn more about the framework's data model and its components.
FMD Framework DataModel -
Data Pipelines Overview: Explore the data pipelines used within the framework.
-
Easily Configure and Load Data into the Framework: Concept to easily extract metadata from you sql server into the Framework.
-
Pipelines Logging: Logging and auditing information.
The FMD Framework supports a wide range of data sources, enabling seamless integration and data ingestion. Below is the list of supported sources:
-
SQL Server
Connect and ingest data from on-premises or cloud-hosted SQL Server databases. -
Azure Data Lake Gen2
Leverage Azure Data Lake Gen2 for scalable and secure data storage and processing. -
SFTP
Securely transfer and ingest files using the SFTP protocol. -
FTP
Ingest data from legacy systems using the FTP protocol. -
Azure Data Factory
Utilize Azure Data Factory for orchestrating and automating data workflows. -
Onelake Tables
Integrate with Onelake Tables for unified data access and management. -
Onelake Files
Access and process files stored in Onelake for streamlined data operations.
-
Fabric SQL Database Limitations:
The Fabric SQL Database may encounter failures, often due to exceeding the allowed number of Fabric Databases in your tenant. To verify this, attempt to create the Fabric Database manually. This will immediately indicate if the issue is related to database limits. -
Trial Capacity Restrictions:
Note that trial capacities are limited to a maximum of three databases. Ensure that your deployment does not exceed this limit. -
Error Handling During Deployment:
During deployment, the notebook may display the following error. If this occurs, you can always re-run the notebook to resolve the issue.
Error creating Fabric Database, make sure you do not execute this on a trial license. You can check this by creating the database manually
-
Upload the File
Upload thecustomer.csv
file to the file section ofLH_DATA_LANDINGZONE
in the Development environment. -
Create a Table
Create a table from the uploaded file and name itin_customer
. -
Run the Process
Once the table is created, execute the complete process to verify that everything has been deployed and configured correctly.
Contributions are welcome! If you have suggestions or improvements, feel free to open an issue or submit a pull request.
This project is licensed under the GNU GENERAL PUBLIC LICENSE.