How We Replaced Node.js with Apache Flink for Real-Time Deduplication and Cut Costs by 7x

Lightning Talk

ShareChat is one of the largest social media platforms in India, with over 180 million monthly active users.

We had a high-throughput real-time stream (>200K RPS) processing using a Node.js + Redis-based deduplication with a 24-hour window.

In this talk, I'll walk you through how we transitioned to an Apache Flink-based solution, the challenges we faced, and the strategies that led to a 7x cost reduction.

Topics Covered:

1. State Management at Scale:

- Our early attempts to structure Flink state efficiently to handle massive-scale deduplication.

- Lessons learned in making the job manageable and performant despite the huge state size.

2. Autoscaling Challenges:

- How we leveraged the Flink Kubernetes Operator to enable autoscaling.

- Why autoscaling initially increased duplication—and how we solved it.

3. When Async API Matters in Apache Flink:

- Understanding the role of Async I/O in Flink.

- How it impacts performance and resource efficiency in real-time streaming.

4. How We Achieved 7x Cost Savings


Andrei Manakov

Sharechat