The Data Ingestion Agent (DIA) enables the fast and secure delivery of data into the Ad Astra cloud without the need for Virtual Private Network (VPN) connections.
- Ad Astra Cloud credentials
- Review the docker CLI documentation; familiarize yourself with the common commands noted in the docker section
- Docker version 18.02 or greater (Community Edition or any Enterprise Edition)
- Docker download for Windows
- Docker download for Mac
- Docker download for Linux (Ubuntu)
- Docker download for Linux (Debian)
- Docker download for Linux (CentOS)
- Docker for Redhat Enterprise Linux (RHEL)
- Redhat Enterprise Linux has its own software distribution system. Please see their documentation for installing Docker or other compatible container systems.
To get started, you can use the built-in interactive wizard to build a properly formatted run command for the DIA. You should be prepared to collect the following information before using this wizard:
- Your Astra Cloud user credentials
- Your Student Information System database credentials for an administrative user, if you choose to ingest data into the Astra cloud
- For all possible Oracle connection string options, see the Oracle section
- Familiarize yourself with the different modes that the agent can run in before using the wizard.
- See the Running the Agent section and corresponding descriptions to get a high level overview of the available agent modes
Some settings provide helpful defaults which you may wish to use for your first run. Hitting enter
will use the default value (in parentheses).
NOTE: Unless you are comfortable with specifying the advanced run settings, we recommend responding "no" to the prompt "would you like to configure advanced run settings?" in the wizard.
WARNING: If you're installing Docker in a nested VM scenario using Hyper-V, see this guide.
Expand and execute one of the following to install the latest version of the agent and start the wizard:
(Windows) Command Prompt
docker pull adastradev/data-ingestion-agent:latest && ^
docker run -it adastradev/data-ingestion-agent:latest wizard
(Windows) Powershell
docker pull adastradev/data-ingestion-agent:latest; `
docker run -it adastradev/data-ingestion-agent:latest wizard
(Linux/Mac) bash
docker pull adastradev/data-ingestion-agent:latest && \
docker run -it adastradev/data-ingestion-agent:latest wizard
Hardware:
- Server grade systems are recommended for hosting Docker and the DIA. Using consumer grade devices such as personal laptops and desktops as the main host for the DIA is strongly discouraged.
Memory:
- 8GB (Recommended)
Memory:
- 4GB (Recommended)
When specifying the ORACLE_ENDPOINT
value (a connection string) to your Oracle instance you may use one of the following formats:
# Easy Connect Connection String
CONNECTION_STRING="hostname:port/service_name"
# TNS Style Connection String using SID
CONNECTION_STRING="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=hostname)(PORT=port))(CONNECT_DATA=(SERVER=DEDICATED)(SID=sid)))"
# TNS Style Connection String using SERVICE_NAME
CONNECTION_STRING="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=hostname)(PORT=port))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=service_name)))"
When connecting to an Oracle database the specified database user must be given read/execute access to the following:
DBMS_METADATA.GET_DDL
(function)ALL_TABLES
(view)ALL_CONS_COLUMNS
(view)ALL_CONSTRAINTS
(view)- All tables referenced by this agent (see Query Preview Mode section below)
The agent is highly dependent on the integration type you specify as part of your commands. Integration types are a simple identifier for the system from which you intend to ingest data. To see a full list of possible integration types, use the wizard as noted in the Quick Start section at the top of this guide.
The agent supports different 'modes' in which it can run. Each mode performs a specific action (see below) and then exits after the action is complete.
This mode will immediately ingest data into the Ad Astra cloud environment. Upon completion of an ingest of data, the container will cease to run.
To run the agent expand and execute one of the following commands.
(Windows) Command Prompt
REM See Host System Requirements above for agent resource requirements
SET PROCESS_MAX_MEMORY_SIZE_MB=4096
REM Define a variable to hold your connection string
SET CONNECTION_STRING=your_connection_string
docker pull adastradev/data-ingestion-agent:latest && ^
docker run -t ^
-m %PROCESS_MAX_MEMORY_SIZE_MB%'M' ^
-e PROCESS_MAX_MEMORY_SIZE_MB=%PROCESS_MAX_MEMORY_SIZE_MB% ^
-e ASTRA_CLOUD_USERNAME=<ASTRA_USERNAME> ^
-e ASTRA_CLOUD_PASSWORD=<ASTRA_PASSWORD> ^
-e ORACLE_ENDPOINT=%CONNECTION_STRING% ^
-e ORACLE_USER=<ORACLE_USER> ^
-e ORACLE_PASSWORD=<ORACLE_PASSWORD> ^
-e INTEGRATION_TYPE=<SIS Type> ^
--network=bridge ^
adastradev/data-ingestion-agent:latest ^
ingest
(Windows) Powershell
# See Host System Requirements above for agent resource requirements
$PROCESS_MAX_MEMORY_SIZE_MB = 4096
# Define a variable to hold your connection string
$CONNECTION_STRING = your_connection_string
docker pull adastradev/data-ingestion-agent:latest; `
docker run -t `
-m $PROCESS_MAX_MEMORY_SIZE_MB'M' `
-e PROCESS_MAX_MEMORY_SIZE_MB=$PROCESS_MAX_MEMORY_SIZE_MB `
-e ASTRA_CLOUD_USERNAME=<ASTRA_USERNAME> `
-e ASTRA_CLOUD_PASSWORD=<ASTRA_PASSWORD> `
-e ORACLE_ENDPOINT=$CONNECTION_STRING `
-e ORACLE_USER=<ORACLE_USER> `
-e ORACLE_PASSWORD=<ORACLE_PASSWORD> `
-e INTEGRATION_TYPE=<SIS Type> `
-e DEFAULT_STAGE=dev `
--network=bridge `
adastradev/data-ingestion-agent:latest `
ingest
(Linux/Mac) bash
# See Host System Requirements above for agent resource requirements
PROCESS_MAX_MEMORY_SIZE_MB = 4096
# Define a variable to hold your connection string
CONNECTION_STRING = your_connection_string
docker pull adastradev/data-ingestion-agent:latest && \
docker run -t \
-m $PROCESS_MAX_MEMORY_SIZE_MB'M' \
-e PROCESS_MAX_MEMORY_SIZE_MB=$PROCESS_MAX_MEMORY_SIZE_MB \
-e ASTRA_CLOUD_USERNAME=<ASTRA_USERNAME> \
-e ASTRA_CLOUD_PASSWORD=<ASTRA_PASSWORD> \
-e ORACLE_ENDPOINT=$CONNECTION_STRING \
-e ORACLE_USER=<ORACLE_USER> \
-e ORACLE_PASSWORD=<ORACLE_PASSWORD> \
-e INTEGRATION_TYPE=<SIS Type> \
--network=bridge \
adastradev/data-ingestion-agent:latest \
ingest
To see a demo of the agent without connecting it to any data source, omit the ORACLE_*
environment variables. In demo mode, the agent can verify connectivity to the Astra Cloud and push a mock dataset into S3.
The docker agent also supports the following optional arguments:
# [dev, prod]
-e DEFAULT_STAGE=prod
-e AWS_REGION=us-east-1
-e CONCURRENT_CONNECTIONS=5
-e INGEST_RESTORATION_RESOURCES=<false or FALSE>
# [error, warn, info, verbose, debug, silly]
-e LOG_LEVEL=info
-e RUN_MATILLION=<true or false> # defaults to false
Prior to sending any data, you can run the following Docker command to examine each query for the specified integration type. No data is sent to the destination using this command. Upon completion of a preview command, the container will cease to run.
To run the agent expand and execute one of the following commands.
(Windows) Command Prompt
REM See Host System Requirements above for agent resource requirements
SET PROCESS_MAX_MEMORY_SIZE_MB=4096
docker pull adastradev/data-ingestion-agent:latest && ^
docker run -i ^
-m %PROCESS_MAX_MEMORY_SIZE_MB%'M' ^
-e PROCESS_MAX_MEMORY_SIZE_MB=%PROCESS_MAX_MEMORY_SIZE_MB% ^
-e ASTRA_CLOUD_USERNAME=<your_username> ^
-e ASTRA_CLOUD_PASSWORD=<your_password> ^
-e INTEGRATION_TYPE=<SIS Type> ^
--network=bridge ^
adastradev/data-ingestion-agent:latest ^
preview
(Windows) Powershell
# See Host System Requirements above for agent resource requirements
$PROCESS_MAX_MEMORY_SIZE_MB = 4096
docker pull adastradev/data-ingestion-agent:latest; `
docker run -i `
-m $PROCESS_MAX_MEMORY_SIZE_MB'M' `
-e PROCESS_MAX_MEMORY_SIZE_MB=$PROCESS_MAX_MEMORY_SIZE_MB `
-e ASTRA_CLOUD_USERNAME=<your_username> `
-e ASTRA_CLOUD_PASSWORD=<your_password> `
-e INTEGRATION_TYPE=<SIS Type> `
--network=bridge `
adastradev/data-ingestion-agent:latest `
preview
(Linux/Mac) bash
# See Host System Requirements above for agent resource requirements
PROCESS_MAX_MEMORY_SIZE_MB = 4096
docker pull adastradev/data-ingestion-agent:latest && \
docker run -i \
-m $PROCESS_MAX_MEMORY_SIZE_MB'M' \
-e PROCESS_MAX_MEMORY_SIZE_MB=$PROCESS_MAX_MEMORY_SIZE_MB \
-e ASTRA_CLOUD_USERNAME=<your_username> \
-e ASTRA_CLOUD_PASSWORD=<your_password> \
-e INTEGRATION_TYPE=<SIS Type> \
--network=bridge \
adastradev/data-ingestion-agent:latest \
preview
The following links display the default set of queries that run for each of the respective SIS integrations:
The DIA is a long running process that may be performing work when an uninstall occurs. To reduce negative side effects of immediately stopping the agent, it is advised to always stop the container with a grace period as shown below. Outright usage of docker kill
is discouraged.
If multiple versions of the ingestion agent exist, be sure to specify the optional tag when removing an image.
docker stop --time 10 <container_name_or_id>
docker rm <container_name_or_id>
docker rmi <image>:<tag>
- docker run - Start a docker container
- docker ps - Observe the status of containers
- docker stats - Monitor resource usage of a running container
- docker stop - Stop a container
- docker rm - Remove a container
- docker rmi - Remove an image
To review all the possible Docker CLI commands see their CLI guide
Windows
Open notepad or notepad++
Copy and paste:
docker pull adastradev/data-ingestion-agent:latest
docker run ....<your data ingestion run cmd>
Save As > NameYourFile.bat
Open Windows Task Scheduler > 'Create Task' > Name your Task -General > Check 'Run with highest privileges' > Select additional desired criteria -Triggers > 'New' > Select desired ingest schedule > 'Ok' -Actions > 'New' > Action is 'Start A Program' > Browse to the .bat file you just created > select that file > 'Ok'
To test, right click the task in Task Scheduler and hit run. A Command Prompt should appear, your docker pull command will run first followed by your ingest command.
Linux/Mac
Create a shell script to contain your DIA run command. For example, the following commands will create a script in your home directory.
$ echo "docker pull adastradev/data-ingestion-agent:latest && docker run ....<your data ingestion run cmd>" > run_ingestion_agent.sh
$ chmod +x run_ingestion_agent.sh
Open up the cron job configuration file.
crontab -e
Call your script as a job to be executed on a schedule. In this case, once a day and append to a log file each time.
0 0 * * * sh /home/run_ingestion_agent.sh >> /home/agent.log
In addition to scheduling the DIA, it is helpful to automate checking if Docker is running. Here are some links to Docker's documentation to help address those areas:
Configure Docker to start on reboot
Check whether Docker is running
The DIA requires outbound internet access over HTTPS to Amazon Web Services (*.amazonaws.com). In general, the agent should be provided outbound internet access via providing a bridge network as shown above. If runnning through an internet proxy, it is recommended to configure the proxy at docker run
time by using an environment variable --env HTTPS_PROXY="https://127.0.0.1:3001"
. For more information, see the Configure Docker to use a proxy server.
No inbound access to the agent is required.
See Getting started with HTTPS proxies for more information.
The DIA utilizes JWT/AWS IAM token authentication for all request authentication. The DIA does not currently use AWS Key Management. The JWT/IAM tokens are only valid for short periods (possibly an hour), at which point the DIA must re-authenticate.
After starting the agent and confirming a healthy status, you can use the containers name or ID to access the virtual machine via command line (bash) as follows:
docker exec -it <container_id_or_name> /bin/bash
The DIA periodically informs Docker of its current health. Using docker inspect
you can get a general idea of the applications state.
docker inspect --format='{{json .State.Health.Status}}' <container_name_or_id>
To monitor container resource usage run the following:
docker stats <container_name_or_id>
# View console output from container host
docker logs <container_name_or_id>
# Copy/export logs from the container to the host
# Create a destination path to store log files. After running the command the copied log file stored in your destination path should be renamed so files are not overwritten.
docker cp <container_name_or_id>:/var/log/dia/. <destination_path>
See the Docker cp command guide for help copying logs to a local file system
- Encryption:
- Data is encrypted in-transit over HTTPS
- Data at rest is encrypted in a private AWS S3 bucket using AES-256 bit encryption