Current London 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


Apache Kafka: meet Apache Druid DART
Druid and Kafka have been best buddies for 10 years, courting and sparking their way around data analytics parties to excess. At the end of 2024, the Apache Druid community released a new query API, DART, giving them access to even more parties and fun times - but this time, where being able to execute complex queries quickly matters more than concurrency. Join to see Druid's DART engine get the slideware treatment, and a Kafka + DART-powered Druid + Grafana analytics pipeline working, complete with step-by-step instructions to make your own.
Peter Marshall, Dave Klein


How We Replaced Node.js with Apache Flink for Real-Time Deduplication and Cut Costs by 7x
ShareChat is one of the largest social media platforms in India, with over 180 million monthly active users. We had a high-throughput real-time stream (>200K RPS) processing using a Node.js + Redis-based deduplication with a 24-hour window. In this talk, I'll walk you through how we transitioned to an Apache Flink-based solution, the challenges we faced, and the strategies that led to a 7x cost reduction. Topics Covered: 1. State Management at Scale: - Our early attempts to structure Flink state efficiently to handle massive-scale deduplication. - Lessons learned in making the job manageable and performant despite the huge state size. 2. Autoscaling Challenges: - How we leveraged the Flink Kubernetes Operator to enable autoscaling. - Why autoscaling initially increased duplication—and how we solved it. 3. When Async API Matters in Apache Flink: - Understanding the role of Async I/O in Flink. - How it impacts performance and resource efficiency in real-time streaming. 4. How We Achieved 7x Cost Savings
Andrei Manakov


Massive Kafka Streams Topology Revamp in Production: No Chaos, No Headaches! My Key Takeaways 🦾!
You've been rocking Kafka Streams in production for a while, but guess what? Times have changed! Your Kafka skills have leveled up, and/or your business is pushing for a fresh twist... 🚀 Now, you need to revamp your entire kafka stream topology—without breaking everything! 😱 But how do you pull this off without disrupting consumers, ensuring accurate the last data updates into your internal topics, and avoiding the headache of renaming your microservice or tweaking input/output topics? 🫨 Join me as we dive into "remapping" fonctionnality from Kstreamplify, our open-source library from Michelin adding extra capabilities to Kafka Streams. Through a simple, hands-on example, I'll show you how to make these changes smoothly. Grab a seat 🪑—let's make topology changes a breeze! 🌪️✨
Marie-Laure Momplot


pyflink Table API on streaming data - Fearless python data engineering
Data engineers around the world have embraced python as the language of choice for designing and developing data engineering pipelines. For streaming data, python DSLs serve the purpose of writing complex business logic in a fluent, readable and efficient way. Apache Flink Table API python transforms enable data streaming engineers to write sophisticated stream transforms such as tumbling, hopping windows and group by key aggregations with pure pythonic fluent DSLs. Join this session to learn the beauty and ease of use of writing python Table API transformations on streaming data with Kafka as the source. This session would also show a live demo of writing python Table API aggregations on streaming data with Kafka. The audience would come out of this session armed with the knowledge of writing complex streaming data transformations using their popular language of choice, python and understand how to construct streaming data pipelines using Apache Flink Table API.
Diptiman Raichaudhuri


Kafka Tiered Storage in Production?
1/3rd of the cost of a typical Kafka cluster is storage. Beyond costing money, fluctuating usage means storage space needs to be monitored and has been a source of on-call pain for us. Tiered storage for Kafka is a newly released feature that promises to dramatically reduce storage costs by offloading most data to cheap storage (eg S3) rather than expensive local or network attached disks (eg EBS). It’s marked as production-ready, but it’s not widely adopted yet. Stripe is currently in the process of migrating to tiered storage across our fleet of more than 50 Kafka clusters. We’ve encountered some problems already like JVM crashes and metadata calls that occasionally time out only for tiered storage topics, and we’re still early in the migration process (though we’ll be done one way or the other by the time this conference takes place!). In this talk you’ll learn about the problems we encountered that either made us abandon the use of tiered storage or that we had to solve to run it successfully in production.
Donny Nadolny


Wrangling Iceberg Tables with PyIceberg CLI: A Hands-On Journey
Apache Iceberg has become a go-to solution for managing massive datasets, and with PyIceberg, it’s now easier than ever to work with Iceberg tables using a pure Pythonic approach, without depending on distributed query engines. In this talk, I’ll introduce PyIceberg and dive straight into a live demo showcasing its CLI commands. Starting with setting up a local Iceberg catalog, I’ll guide you through the basic developer workflow of working with a catalog and tables. Following that, I’ll create a table in the catalog with Python, insert data, and demonstrate various CLI operations on tables and namespaces, such as listing, describing, dropping, and managing properties. By the end of the session, you’ll have a solid understanding of PyIceberg’s capabilities and how it simplifies managing Iceberg tables in Python-centric workflows. If you love Python and Apache Iceberg, this talk is for you!
Dunith Danushka