Description
Because of startup latency associated with Snowflake, per-event queries to write event data directly to the database may run inefficiently. Instead, event data should be written to a new S3 bucket.
A new, secure S3 bucket must be created. Access to this S3 bucket should be minimal, and the fideslog service must have write access.
Event data should be batched prior to uploading, using a combination of time- and size-based batching (ex. publish to S3 every five minutes, or each time the buffer exceeds 1MB in size).
Some additional things to consider are:
- What is the optimal file type to publish to S3?
- How should the event data be structured for upload?
- Should uploads stream to existing files in the S3 bucket, or create a new one each time?
This is in addition to writing events directly to Snowflake; no code affecting Snowflake DB interactions should be removed. Future work will remove the direct Snowflake interaction mechanism.