8000 GlassFlow · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
GlassFlow Logo

Docker GitHub LinkedIn Twitter

GlassFlow Overview

GlassFlow for ClickHouse Streaming ETL is a real-time stream processor designed to simplify data pipeline creation and management between Kafka and ClickHouse. It provides a powerful, user-friendly interface for building and managing real-time data pipelines with built-in support for deduplication and temporal joins.

Built specifically for data engineers, GlassFlow handles late-arriving events, ensures exactly-once correctness, and scales with high-throughput data. It delivers accurate, low-latency results from streaming data without compromising simplicity or performance. The tool's intuitive web interface makes it easy to configure and monitor pipelines, while its robust architecture ensures reliable data processing.

Key Features

  • Streaming Deduplication:

    • Real-time deduplication of Kafka streams before ingestion into ClickHouse
    • Configurable time windows up to 7 days for deduplication
    • Simple configuration of deduplication keys and time windows
    • One-click setup for deduplicated data pipelines
    • Prevents duplicate data from reaching ClickHouse
  • Temporal Stream Joins:

    • Join two Kafka streams in real-time
    • Configurable time windows up to 7 days for stream joins
    • Configure join keys and time windows through the UI
    • Simplified join setup process
    • Produce joined streams ready for ClickHouse ingestion
  • Built-in Kafka Connector:

    • Automatic data extraction from Kafka topics
    • Seamless integration with Kafka clusters
    • No manual data pulling required
    • Supports multiple Kafka topics and partitions
    • Native support for JSON data types
  • Optimized ClickHouse Sink:

    • Native ClickHouse connection for maximum performance
    • Configurable batch sizes for efficient data ingestion
    • Adjustable wait times for optimal throughput
    • Built-in retry mechanisms
    • Automatic schema detection and management
    • Full support for JSON data types in ClickHouse
  • User-Friendly Interface: Web-based UI for pipeline configuration and management

  • Local Development: Includes demo setup with local Kafka and ClickHouse instances

  • Docker Support: Easy deployment using Docker and docker-compose

  • Self-Hosted: Open-source solution that can be self-hosted in your infrastructure

Getting Started

To get started with GlassFlow, visit our main repository at glassflow/clickhouse-etl. The repository contains:

  • Complete documentation
  • Quick start guide
  • Example configurations
  • Docker setup instructions
  • API documentation

Clone the repository to get started:

git clone https://github.com/glassflow/clickhouse-etl.git

Pinned Loading

  1. clickhouse-etl clickhouse-etl Public

    Real-time deduplication and temporal joins for streaming data

    TypeScript 174 4

  2. clickhouse-etl-py-sdk clickhouse-etl-py-sdk Public

    Python SDK to create GlassFlow Clickhouse ETL pipelines

    Python 5

  3. glassgen glassgen Public

    A flexible synthetic data generator with configurable schemas, multiple sinks, and controlled event duplication.

    Python 5

Repositories

Showing 10 of 12 repositories
  • clickhouse-etl Public

    Real-time deduplication and temporal joins for streaming data

    glassflow/clickhouse-etl’s past year of commit activity
    TypeScript 174 Apache-2.0 4 1 3 Updated May 16, 2025
  • clickhouse-etl-py-sdk Public

    Python SDK to create GlassFlow Clickhouse ETL pipelines

    glassflow/clickhouse-etl-py-sdk’s past year of commit activity
    Python 5 MIT 0 0 0 Updated May 16, 2025
  • glassgen Public

    A flexible synthetic data generator with configurable schemas, multiple sinks, and controlled event duplication.

    glassflow/glassgen’s past year of commit activity
    Python 5 MIT 0 0 0 Updated May 16, 2025
  • .github Public
    glassflow/.github’s past year of commit activity
    0 0 0 0 Updated May 13, 2025
  • airbyte Public Forked from airbytehq/airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    glassflow/airbyte’s past year of commit activity
    Python 0 4,578 0 0 Updated Mar 25, 2025
  • pipelines-push-action Public

    This Github Action lets you automate GlassFlow pipelines deployments as code

    glassflow/pipelines-push-action’s past year of commit activity
    Python 1 0 0 0 Updated Feb 24, 2025
  • glassflow-python-sdk Public

    GlassFlow Python SDK to publish and consume data to your pipelines at Glassflow.dev

    glassflow/glassflow-python-sdk’s past year of commit activity
    Python 9 MIT 1 1 2 Updated Feb 20, 2025
  • glassflow-examples Public

    Tutorials, templates for running glassflow pipelines

    glassflow/glassflow-examples’s past year of commit activity
    Jupyter Notebook 30 4 0 0 Updated Feb 12, 2025
  • pipelines-repo-template Public template

    Template to how to structure GlassFlow pipelines in github

    glassflow/pipelines-repo-template’s past year of commit activity
    Python 0 0 0 0 Updated Jan 30, 2025
  • example-real-time-ai-alerts Public

    Real-Time Alerts with AI, NATs and Streamlit

    glassflow/example-real-time-ai-alerts’s past year of commit activity
    Python 9 MIT 1 0 0 Updated Jan 2, 2025

Top languages

Loading…

Most used topics

Loading…

0