This repository contains tests using Great Expectations 1.5 to validate data quality in CSV files.
test_order_id_not_null.py
: Validates that theOrder_ID
column in the orders dataset is never null.
.
├── data/
│ └── orders.csv # Orders dataset
├── great_expectations/ # Auto-generated when tests are run
└── test_order_id_not_null.py # Test script
The project requires Python 3.12 and the following dependencies (specified in pyproject.toml):
- great-expectations >= 1.5
- pandas >= 2.1.1
- numpy == 1.26.4
To run the Order ID validation test:
python test_order_id_not_null.py
The test will:
- Create a Great Expectations context
- Load the orders.csv data
- Create an expectation suite with the rule that Order_ID should never be null
- Validate the data against this expectation
- Output the results
After running the test, the results will be displayed in the console. Additionally, Great Expectations will generate documentation in the great_expectations/uncommitted/data_docs/local_site/
directory that can be viewed in a web browser.
To add more data quality tests:
- Duplicate the existing test file
- Modify the expectations to include your new validation rules
- Run the new test file