Stars
Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.
🐍 Quick reference guide to common patterns & functions in PySpark.
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
A set of exercises to prepare for Certified Kubernetes Application Developer exam by Cloud Native Computing Foundation
A curated and opinionated list of resources for Chief Technology Officers, with the emphasis on startups
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
This repository contains the notebooks and presentations we use for our Databricks Tech Talks
A topic-centric list of HQ open datasets.
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment anal…
Jupyter notebooks for the code samples of the book "Deep Learning with Python"
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Roadmap to becoming a data engineer in 2021
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
The Game Analytics Pipeline is a customer deployable reference architecture to help game developers ingest, store, and analyze telemetry data from games and services.
ETL with Python - Taught at DWH course 2017 (TAU)
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
A collection of learning resources for curious software engineers