Kafka-project

Overview

This project demonstrates an end-to-end data pipeline using Apache Kafka, Java, and OpenSearch. The application reads a continuous stream of data from Wikimedia, processes it in real-time, and stores it in Kafka. The data is then sent to OpenSearch for storage and further analysis.

A Preview

Kafka-project - Watch Video

Components

Data Source: Wikimedia

The application reads event changes (new posts or updates to existing ones) from Wikimedia in real-time.

Kafka Producer:

The producer reads the event changes from Wikimedia and publishes them to a Kafka topic named wikimedia.newChanges.
The topic is configured with 3 partitions to reduce latency by approximately 30%.

Kafka Topic Configuration:

Topic Name: wikimedia.newChanges
Partitions: 3
This configuration allows for parallel processing and reduces latency.

Kafka Consumers:

The application consists of 2 consumers reading from 3 partitions for increased throughput. Two consumers were enough to keep the lag to a trivial number.

OpenSearch:

OpenSearch is used as the database to store every record from the Kafka topic.
Data is sent to OpenSearch in batches, which increases throughput by approximately 15%.

Technologies used

Java
Kafka
Zookeeper
ElasticSearch
Event Handling

How to run

Download Kafka and Zookeeper: https://www.conduktor.io/kafka/how-to-install-apache-kafka-on-windows/
Create an account on Bonsai to access Opensearch

Run the following command in a terminal to start Zookeeper

zookeeper-server-start.sh ~/kafka_2.13-3.8.0/config/zookeeper.properties

Run the following command in a terminal to start Kafka

kafka-server-start.sh ~/kafka_2.13-3.8.0/config/server.properties

Clone and download the code in from repository and run it in a Java IDE (preferably IntelliJ)
Run the Consumer classes followed by the Producer class

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
consumer-opensearch		consumer-opensearch
gradle/wrapper		gradle/wrapper
kafka-streams		kafka-streams
producer-wikimedia		producer-wikimedia
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kafka-project

Overview

A Preview

Components

Data Source: Wikimedia

Kafka Producer:

Kafka Topic Configuration:

Kafka Consumers:

OpenSearch:

Technologies used

How to run

About

Uh oh!

Releases

Packages

Uh oh!

Languages

overlordiam/kafka-project

Folders and files

Latest commit

History

Repository files navigation

Kafka-project

Overview

A Preview

Components

Data Source: Wikimedia

Kafka Producer:

Kafka Topic Configuration:

Kafka Consumers:

OpenSearch:

Technologies used

How to run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages