This DAG demonstrates how to use the following:
RedshiftSQLOperator
RedshiftPauseClusterOperator
RedshiftResumeClusterOperator
RedshiftClusterSensor
RedshiftToS3Operator
S3ToRedshiftOperator
- Astro CLI or Astrocloud CLI
- Accessible Redshift Cluster
- Account with read/write access to an S3 Bucket
- Airflow Instance (If you plan on deploying)
If you are using the
astro
CLI instead of theastrocloud
CLI, you can simply replaceastrocloud
in the below commands withastro
git clone git@github.com:astronomer/cs-tutorial-databricks.git
cd cs-tutorial-redshift
astrocloud dev start
-
Go to your sandbox http://locahost:8080/home
-
Navigate to connictions (i.e. Admin >> Connections)
-
Add a new connection with the following parameters:
- Connection Id: redshift_default
- Connection Type: Amazon Redshift
- Host:
<Your-Redshift-Endpoint>
- Schema:
<Your-Redshift-Database>
- Login:
<Your-Redshift-Login>
- Password:
<Your-Redshift-Password>
- Port:
<Your-Redshift-Port>
-
Add another connection with the following parameters:
- Connection Id: aws_default
- Connection Type: Amazon Web Services
- Extra: {"aws_access_key_id": "", "aws_secret_access_key": "", "region_name": ""}
In order to use all of the components from this POC, the account associated with your
aws_default
connection will need the following permissions:
- Access to perform read/write actions for a pre-configured S3 Bucket
- Access to interact with the Redshift Cluster, specifically:
redshift:DescribeClusters
redshift:PauseCluster
redshift:ResumeCluster
In the redshift_example_dag.py
you'll need to replace variables like cluster_identifier
, from_redshift_table
,
s3_bucket
, schema
, and table
with the corresponding values that actually exist in your Redshift Cluster/S3 Storage
After following these steps, you should be able to run the tasks in the redshift_example_dag
. Enjoy!