Current Bengaluru 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


All you need to know about Kafka v4.0 as a Kafka or Flink user
Apache Kafka v4.0 introduces significant changes that will affect how you deploy, configure, and operate your Kafka clusters. In addition to the famous "zookeeper removal", what else does Kafka v4.0 bring to us? As a Kafka user, what changes will impact my existing kafka cluster? As a Flink user, am I safe from the kafka v4.0 upgrade when adopting flink kafka connector? In this session, I'll go through all the important changes in Apache Kafka v4.0, like the log4j2 upgrade(KIP-653), the next-generation rebalance protocol(KIP-848), the async consumer, the Eligible Leader Replicas(KIP-996), some default configuration changes(KIP-1030), and many component/API deprecation and removal...etc. Not just introducing these changes, most importantly, I'll let you know what they impact your existing cluster, for both Kafka and Flink. Like in the change of KIP-996, will it change the existing unclean leader election mechanism? like in the change of KIP-848, will the Flink kafka connector need to adopt the new async consumer for the new rebalance protocol..., like in the removal of old client protocol API versions(KIP-896), will it be incompatible to my existing kafka clients or Flink Kafka connector...etc. After this session, you will have better understanding of what changes the Apache Kafka v4.0 will have. And you can also know what "actions" you have to take for your existing clusters, no matter it's Kafka or Flink. Finally, you know you can upgrade to Kafka v4.0 without any "surprise".
Luke Chen


An Orchestrator for global data migrations: A Kafka-powered service mesh approach
At Atlassian, we have numerous inter-connected services whose shards are deployed on data centers across the globe. These services together participate in a complex stream-based ‘data migration’ workflow. To enable this, within the Platform org, we built a scalable, resilient and globally consistent Orchestrator. This Orchestrator leverages Kafka’s State store, Kafka streams and Kafka Connect. It provides a “service mesh” equivalent for Kafka-integrated services, enabling seamless coordination and communication between different “steps” of the workflow. The architecture allows different shards and nodes of services to enter and exit the service mesh, specify tenants, allocate usage-based quota to callers and so on. Central to our solution are 5 things: 1️⃣ A State store based context management that’s globally consistent 2️⃣ An SDK that services can readily integrate. This SDK seamlessly abstracts out Kafka for application developers. 3️⃣ A dynamic registry of services which not only catalogs the available services but also maintains an up-to-date map of service deployments across data centers, their health and their usage. 4️⃣ The orchestrator's intelligent routing algorithms that enable an application developer to seamlessly run a workflow that automatically resolves the most appropriate ‘data shard’ for each ‘step’ of the workflow, based on the application's requirements and the callee service’s constraints 5️⃣ A Kafka Connect based “message relay” service which handles cross data center message movement and which optionally provides exactly-once guarantees. Join us to explore the inner workings of this Orchestrator, how it leverages Kafka’s (possibly less popular) capabilities to address modern distributed stream-based data migration applications. We'll discuss real-world use cases from Jira, Confluence and other popular products of Atlassian and share our insights which can help you push the boundaries of what's possible with Kafka.
Manu Manjunath, Ravi Gupta


Modernising Kafka at Uber Scale: Upgrading from Legacy to New Hardware SKUs
Within Uber, we have numerous Kafka clusters comprising thousands of nodes tailored to different use cases. These clusters collectively handle a few trillion messages daily, amassing multiple Petabytes of data ingestion. These messages are distributed across thousands of topics. Several of these clusters are exceptionally large, exceeding 150 nodes in some cases. Kafka serves a crucial role in enabling inter-service communication, transporting database changelogs, facilitating data lake ingestion, and more. Notably, Kafka houses business-critical data like billing and payment information. Kafka is a tier-0 technology at Uber, guaranteeing 99.99% data durability, and its availability is tied to the health of the underlying nodes. However, these nodes are ageing, leading to increasing disk failures and the need for replacements, with potential risks of offline partitions and data loss. To ensure uninterrupted operations, there is a need to migrate topics and partitions to newer, high-performance SKUs. The migration introduces several challenges: 1. Preserving rack-aware distribution to maintain zone failure resiliency during the migration. 2. Managing significant differences in disk capacity between the old SKU (legacy) nodes and the new SKU (H20A) nodes. 3. Adhering to disk usage thresholds on the new SKU nodes to avoid performance degradation. 4. Balancing nodes within racks to ensure continuous resiliency and fault tolerance. 5. Handling variability in Kafka cluster configurations, especially for low-latency clusters, where introducing new replicas could increase latency. Join us to learn how we overcame these challenges using strategies like tiered storage and cluster rebalance to successfully migrate Kafka infrastructure at Uber.
Nikin Raagav, Abhijeet Kumar


Flink SQL WINDOW function recipes in my Flink kitchen!
Are you struggling to come to terms with Flink SQL WINDOW functions for processing your stream ? Are you new to Flink SQL ? If answers to both these questions are ‘Yes’, then, join my session on getting introduced to WINDOW functions on a data stream, with live examples. No presentations please ! ! Followed by live coding of Flink SQL WINDOW operations on real world streaming data and get your hands dirty ! ! We will start by understanding the syntax of a WINDOW function in general and then dive deeper into Flink Table Valued Functions (TVF) with Flink 1.20. Then we’ll understand how TUMBLE, HOP WINDOW functions operate using live SQL examples. Next, we will build an end-to-end demo with data streams generated by Kafka and apply Flink SQL WINDOW operations on the data stream to transform and aggregate data. You come out of my session with an enhanced knowledge about data stream WINDOW functions using Flink SQL and will be able to run the example to align it closer to your use case.
Diptiman Raichaudhuri


Demystify MirrorMaker2 for DR, at India scale(~ 1 trillion messages a day!) - The PhonePe story
Operational resilience and Disaster Recovery (DR) through Kafka is indispensable for businesses to grow at a rapid pace in a high velocity and high risk environment of digital payments that PhonePe operates in. It has been core to our platform first approach from Day 1 and helped us drive widespread digital adoption in India. It ensures data integrity and availability thereby safeguarding user trust, business operations and regulatory compliance preventing financial losses. PhonePe has revolutionised India’s financial and digital landscape to build a cashless economy with financial traceability; and processes ~9 billion transactions/month which is almost 4 times the size of other global digital payment giants. This has enabled financial inclusion by empowering millions of Indians easy access to digital payments across both urban and rural India and including a 30 million vast merchant network. Join this session to learn how we demystified MirrorMaker2 for achieving DR in various ways through Kafka. We will talk about selection criteria for applications, different types of outages and the cost-benefit analysis for these applications. With MM2, we will explain how application-side dual writes challenge was resolved and the intricacies of setting up Shallow mirroring, Switch implementations with offset translation and Building automatic failure detectors. The session will also talk about Monitoring & Alerting and L7 proxy setup for transparent failovers. The audience would also have key takeaways on cluster architecture, rack-awareness for brokers, producers and consumers and implementation considerations from our learnings of setting up a platform – which is compliant by design, and able to scale to handle high volume and speed of data flow.
Guruprasad Sridharan, Nitish Goyal


Building agent systems with Apache Kafka and Apache Flink
AI-powered agent systems are becoming essential for automation, personalization, and real-time decision-making. But how do we ensure that these agents can process information continuously, maintain context, and provide intelligent responses at scale? This talk explores how Apache Kafka and Apache Flink can be used to build dynamic real-time agent systems. We'll start with the basics of agent-based systems - how they work, how they communicate, and how they retrieve and generate relevant knowledge using Retrieval-Augmented Generation. Then, we'll look into real-time streaming architectures, showing how Kafka handles message passing between agents and Flink processes events to track context and enable intelligent responses. By the end of this session, you'll have a clear roadmap for designing AI-driven agent systems that are context-aware, efficient and work with a continuous stream of data. Whether you're working on chatbots, monitoring systems, or intelligent automation, this talk will provide practical insights into bridging streaming data with generative AI to power the next generation of autonomous agents. Perfect for beginners and experts alike, this session offers valuable insights for all skill levels.
Olena Kutsenko