Current London 2025

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

Kafka Tiered Storage in Production?

1/3rd of the cost of a typical Kafka cluster is storage. Beyond costing money, fluctuating usage means storage space needs to be monitored and has been a source of on-call pain for us. Tiered storage for Kafka is a newly released feature that promises to dramatically reduce storage costs by offloading most data to cheap storage (eg S3) rather than expensive local or network attached disks (eg EBS). It’s marked as production-ready, but it’s not widely adopted yet. Stripe is currently in the process of migrating to tiered storage across our fleet of more than 50 Kafka clusters. We’ve encountered some problems already like JVM crashes and metadata calls that occasionally time out only for tiered storage topics, and we’re still early in the migration process (though we’ll be done one way or the other by the time this conference takes place!). In this talk you’ll learn about the problems we encountered that either made us abandon the use of tiered storage or that we had to solve to run it successfully in production.

Presenters

Donny Nadolny

Breakout Session

May 21

Wrangling Iceberg Tables with PyIceberg CLI: A Hands-On Journey

Apache Iceberg has become a go-to solution for managing massive datasets, and with PyIceberg, it’s now easier than ever to work with Iceberg tables using a pure Pythonic approach, without depending on distributed query engines. In this talk, I’ll introduce PyIceberg and dive straight into a live demo showcasing its CLI commands. Starting with setting up a local Iceberg catalog, I’ll guide you through the basic developer workflow of working with a catalog and tables. Following that, I’ll create a table in the catalog with Python, insert data, and demonstrate various CLI operations on tables and namespaces, such as listing, describing, dropping, and managing properties. By the end of the session, you’ll have a solid understanding of PyIceberg’s capabilities and how it simplifies managing Iceberg tables in Python-centric workflows. If you love Python and Apache Iceberg, this talk is for you!

Presenters

Dunith Danushka

Breakout Session

May 21

Unified Schema Registry: Ensure schema consistency across data systems

Schema Registry is the backbone of safe schema evolution and efficient data transfer in the Kafka ecosystem. Our data infrastructure spans online APIs (gRPC/Rest.li), databases (MySQL/Oracle/Espresso/TiDB), and powerful streaming and ingestion frameworks built on Kafka. From real-time ETL to OLAP systems like Pinot, from tracking and metrics pipelines feeding into Hadoop, to offline jobs pushing insights into derived stores like Venice; Kafka is the key cog in our data flywheel. As data moves through LinkedIn’s ecosystem, it traverses multiple schema languages and serialization formats, evolving—sometimes seamlessly, sometimes with transformation. But what happens when an upstream schema change inadvertently breaks downstream systems? Traditional siloed validation falls short, leading to data corruption, operational disruptions, and painful debugging. Join us as we dive into LinkedIn’s Universal Schema Registry (USR) solution for seamless, large-scale schema validation. 1. End-to-End Schema Compatibility – USR validates schemas holistically across RPC, Kafka, Espresso, Venice, and more, safeguarding the entire data lineage. 2. CI-Integrated Early Validation – A shift-left approach catches issues at schema authoring time, saving thousands of developer hours and avoiding costly data migrations. 3. Multi-Format Schema Mapping – Supports complex transformations across Proto, Avro, and more, enabling automated migrations like LinkedIn’s move from Rest.li to gRPC for online RPCs. Join us as we share LinkedIn’s journey of building and scaling USR to handle millions of validations every week, the challenges faced, and the best practices that keep our data ecosystem resilient.

Presenters

Souman Mandal, Sarthak Jain

Breakout Session

May 21

Democratising Stream Processing: How Netflix Empowers Teams with Data Mesh and Streaming SQL

As data volume and velocity continue to grow at unprecedented rates, organisations are increasingly turning to stream processing to unlock real-time insights and fuel data-driven decision-making. This presentation will explore how Netflix has evolved its Data Mesh platform, a distributed data architecture, to embrace the power of Streaming SQL. The session will showcase the journey from the initial implementation of Data Mesh as a data movement platform to its transformation into a comprehensive stream processing powerhouse with the integration of the Data Mesh SQL Processor. Attendees will learn about the challenges Netflix faced with its earlier system, which relied on pre-built processors and the low-level Flink DataStream API, leading to limitations in expressiveness and a steep learning curve for developers. The presentation will then unveil the innovative solution: the Data Mesh SQL Processor, a platform-managed Flink job that harnesses the familiarity and versatility of SQL to simplify stream processing. Through practical examples, attendees will discover how the SQL Processor interacts with Flink's Table API, converting data streams into dynamic tables for seamless SQL processing. Moreover, the session will spotlight the platform's user-friendly SQL-centric features, including an interactive query mode for live data sampling, real-time query validation, and automated schema inference.

Presenters

Sujay Jain

Breakout Session

May 21

Migrating Kafka from ZooKeeper to KRaft: adventures in operations

From version 4.0 of Kafka, clusters will not support ZooKeeper anymore. This is a change that has been in the works for a while, and Kafka now has its own consensus protocol named KRaft. If you want to create a new Kafka cluster, using KRaft is pretty straightforward, but what if you already have existing Kafka clusters? Since Kafka 3.6, you can migrate ZooKeeper based clusters to KRaft. The process has a few intricacies about it and even more so if you want to automate it, rather than doing it manually for each cluster. In this session we will take you along for the ride of migrating a ZooKeeper based Kafka cluster, to a KRaft one. We’ll share the pitfalls we discovered while implementing an automated process in the open source Kafka operator Strimzi to handle this operation. It was certainly an adventure, so come along to see how you can navigate this migration as smoothly as possible, by avoiding a lot of error-prone manual steps on your production clusters.

Presenters

Kate Stanley, Paolo Patierno

Breakout Session

May 21

Async Processing: Get any Kafka Streams app into the throughput Olympics

Since the very dawn of Kafka Streams it has been haunted by the possibility of high latencies in a topology. Custom logic with heavy processing, RPCs, and remote state stores were more or less considered out of the question, and those who bravely tried anyways ran into endless problems. At the same time, even simple applications were getting in trouble as companies grew and so did their workloads, yet apps could only scale up to the partition count. Are slow and/or heavy applications just not a good fit for Kafka Streams? The answer of course is no! With Kafka Streams anything is possible. In this talk we’ll introduce Responsive’s Async Processor, a lightweight wrapper that turns your slowest applications into medal winners. With just a few lines of code you can easily convert any Kafka Streams app to async. By injecting an async thread pool to hand over the actual processing work, records can be executed in parallel, even within a partition — all without sacrificing any of the usual correctness guarantees like same-key ordering or exactly-one semantics. This feature is production-ready and available now through the Responsive SDK, but we’ll end by discussing our vision for native async processing in open source Kafka Streams. We’ll also go over some neat features we added to Kafka Streams like the ProcessorWrapper, which is used by the async framework but has the potential for so much more. Join us to hear all the gory details of async processing and how it reimagines the limits of Kafka Streams itself. Bring your slowest application and find out if it can make the team!

Presenters

A. Sophie Blee-Goldman

Breakout Session

May 21