Real-Time E-Commerce Analytics with Apache Flink, Elasticsearch, and Postgres
Introduction
In today's fast-paced digital economy, real-time analytics is crucial for gaining business insights and making data-driven decisions. This post covers a real-time sales analytics pipeline built using Apache Flink, Kafka, Elasticsearch, and Postgres. By leveraging stream processing, this project enables real-time tracking of financial transactions, category-wise sales, and daily/monthly revenue.
Project Overview
This project is an Apache Flink application that processes financial transaction data streamed via Kafka. The processed data is then stored in both Postgres (for structured storage) and Elasticsearch (for full-text search and analysis). The entire system is orchestrated using Docker Compose, ensuring seamless deployment and scalability.
Architecture
The system architecture follows this flow:
Sales Transaction Generator (Python -
main.py
): Simulates real-time financial transactions and pushes data into Kafka topics.Kafka Consumers (Flink Application -
DataStreamJob.java
):Reads transaction data from Kafka.
Performs aggregations and transformations.
Writes the results into Postgres and Elasticsearch.
Postgres Database:
Stores structured data for analytical queries.
Maintains tables:
transactions
,sales_per_category
,sales_per_day
,sales_per_month
.
Elasticsearch:
Stores indexed transaction data for full-text search and visualization.
Enables fast retrieval and filtering of financial records.
Setting Up the Project
Prerequisites
Docker & Docker Compose
Apache Flink (1.20.0)
Java 21
Maven
Python (for transaction generator)
Installation Steps
Clone the repository:
git clone <repository_url>
cd flink-ecommerce-analytics
Start the infrastructure using Docker Compose:
docker-compose up
Generate sales transactions:
python main.py
Compile and build the Flink application:
mvn clean compile package
Run the Flink job:
/opt/homebrew/Cellar/apache-flink/1.20.0/bin/flink run -c flinkecommerce.DataStreamJob target/flinkecommerce-1.0-SNAPSHOT.jar
Verify data in Postgres:
SELECT * FROM transactions;
Search indexed records in Elasticsearch:
curl -X GET "http://localhost:9200/transactions/_search" -H 'Content-Type: application/json' -d'{"query": {"match_all": {}}}'
Running Flink in VS Code (MacOS)
For development, install the required VS Code extensions:
Maven for Java
Extension Pack for Java
Create a Maven Project
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.16.0 \
-DgroupId=com.example \
-DartifactId=flink-project \
-Dversion=1.0-SNAPSHOT \
-DinteractiveMode=false
Start Flink Cluster
bin/jobmanager.sh start
bin/taskmanager.sh start
Conclusion
This real-time analytics pipeline demonstrates how Apache Flink can be used to process streaming transaction data at scale. By integrating Postgres and Elasticsearch, businesses can gain powerful insights into their sales trends, helping them make data-driven decisions efficiently.
Stay tuned for more updates, and feel free to reach out with questions or suggestions! 🚀