The terraglue project was created for helping people to improve their learning journey on AWS Glue service. It accomplishes that by enabling a pocket environment with all necessary componentes to start developing jobs, including S3 buckets, sample data on Data Catalog, IAM roles and policies, a pre configured Athena workgroup and finally an end to end Glue job example that reads, transform and catalog new data.
- Have you ever wanted to learn Glue but you stuck on a complex environment set up?
- Have you ever wanted to test an idea for an ETL in a pocket and disposable environment?
- Have you ever wanted to go the next level on developing Glue jobs?
🌖 Try terraglue!
Note Now the terraglue project has an official documentation in readthedocs! Visit the following link and check out usability technical details, practical examples and more!
- 🚀 Have a pocket and disposable AWS environment with all infrastructure needed to start developing Glue jobs
- 🤖 No need to to worry about bucket creation, IAM roles and policies definition or even uploading datasets in your AWS account
- 📊 Possibility to run queries on different public datasets written and catalogged for users to improve their analytics skills
- 🛠️ Usage of Terraform as IaC tool for providing a consistent infrastructure
- 🔦 Turn in and turn off whenever wanted to
AWS Glue
- AWS - Glue Official Page
- AWS - Jobs Parameters Used by AWS Glue
- AWS - GlueContext Class
- AWS - DynamicFrame Class
- Stack Overflow - Job Failing by Job Bookmark Issue - Empty DataFrame
- AWS - Calling AWS Glue APIs in Python
- AWS - Using Python Libraries with AWS Glue
- Spark Temporary Tables in Glue Jobs
- Medium - Understanding All AWS Glue Import Statements and Why We Need Them
- AWS - Develop and test AWS Glue jobs Locally using Docker
- AWS - Creating OpenID Connect (OIDC) identity providers
Terraform
- Terraform - Hashicorp Terraform
- Terraform - Conditional Expressions
- Stack Overflow - combine "count" and "for_each" on Terraform
Apache Spark
- SparkByExamples - Pyspark Date Functions
- Spark - Configuration Properties
- Stack Overflow - repartition() vs coalesce()
GitHub
- Conventional Commits
- Semantic Release
- GitHub - Angular Commit Message Format
- GitHub - commitlint
- shields.io
- Codecoverage - docs
- GitHub Actions Marketplace
- Continuous Integration with GitHub Actions
- GitHub - About security hardening with OpenID Connect
- GitHub - Securing deployments to AWS from GitHub Actions with OpenID Connect
- GitHub - Workflow syntax for GitHub Actions
- Eduardo Mendes - Live de Python #170 - GitHub Actions
Docker
- GitHub Docker Run Action
- Using Docker Run inside of GitHub Actions
- Stack Overflow - Unable to find region when running docker locally
Testes
- Eduardo Mendes - Live de Python #167 - Pytest: Uma Introdução
- Eduardo Mendes - Live de Python #168 - Pytest Fixtures
- Databricks - Data + AI Summit 2022 - Learn to Efficiently Test ETL Pipelines
- Real Python - Getting Started with Testing in Python
- Inspired Python - Five Advanced Pytest Fixture Patterns
- getmoto/moto - mock inputs
- Codecov - Do test files belong in code coverage calculations?
- Jenkins Issue: Endpoint does not contain a valid host name
Outros