Current London 2025

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

From Zero to Hero: petabyte-scale Tiered Storage lessons

Whether you’re running mission critical applications or just shipping logs in real time, Tiered Storage can make your Kafka Cluster cheaper, easier to manage and faster. To understand the benefits, tradeoffs and the development history join this talk where we’ll uncover KIP-405 and showcase how the community backed up this important feature for Apache Kafka. We’ll rollback the KIP history, starting from 2018, to understand the major milestones and share details on how major industry leaders like Apple, Datadog and Slack helped out and tested both the Tiered Storage functionality and the first AWS S3 open source plugin. Furthermore, we’ll share details, gotchas, and tradeoffs of users successfully adopting Tiered Storage in production at scale, surpassing 150 GB/s of throughput. If you want to optimize your Apache Kafka cluster for performance, cost, and overall health, this session is for you.

Presenters

Francesco Tisiot, Filip Yonov

Breakout Session

May 20

Into the Otter-Verse: Using time-travel to transform Kafka Streams development and operations.

One potential benefit of using a stream processor like Kafka Streams to build applications on the log is the ability to time travel. What if you could go back in time and query state stores to see when a bug was introduced? Or what if you could freeze the state of a running application and make a copy to do pre-deploy testing? This potential has largely gone unrealized because of a missing primitive in Kafka Streams - the ability to create a consistent snapshot that can be read and even cloned into a new application. Until now. We first explain exactly what snapshots and clones are. In short, a snapshot contains all the application's state up to some point in time, and no state after. A clone is a copied application created from this state. Next, we’ll make the case for why snapshots are a game-changing feature for Kafka Streams. Snapshots take your application into a multiverse (or otter-verse) of histories + branches. We’ll show how you can use them to explore your application’s history, interactively debug, test changes against real data, do blue/green deploys, and more. The remainder of the talk dives into the theory + practice of Kafka Streams snapshots. First we cover what’s been missing from Kafka Streams to support them. In particular, Kafka Streams currently lacks synchronization mechanisms to enable a consistent topology-wide snapshot. It also maintains state locally, which makes a snapshot difficult to access. Next, we discuss how we fill these gaps with Responsive. Specifically, we give an overview of RS3, our S3-backed store built on SlateDB, and how we use it with our SDK to take consistent snapshots. We’ll close this section with our vision for how snapshots can be contributed back to Kafka Streams. Finally, we’ll close the talk with a demo to show the power of snapshots in action. Viewers should come away with an understanding of snapshots/ clones, how they can be used to solve common problems, and how we’ve built them in Responsive.

Presenters

Rohan Desai

Breakout Session

May 20

Kafka Consumer 4.0 - Major version, major improvements, get to know it all

A new major version of the KafkaConsumer is out, bringing in fundamental changes and improvements, as it’s the first version to fully implement the next generation of the Consumer Group Rebalance Protocol, introduced with KIP-848. It’s all a brand new production-ready feature now! Want to hear about how these major changes materialize in the KafkaConsumer? What’s in? What’s out? What’s different? This talk is for you then! We will cover the core of the new rebalance protocol, its implementation on the Java client, and how it significantly improves and simplifies the whole group consumption experience, addressing its main pain points. We will also share about the revamped KafkaConsumer threading model, shipped alongside the new rebalance protocol client implementation. It all sounds promising, but we do know that upgrades might be scary, right? Whether you’re a Kafka developer, operator, or architect, this talk will equip you with everything you need to confidently adopt KafkaConsumer 4.0 in your client applications. From how the live upgrade and protocol interoperability works, to detailed client changes: configuration changes, API deprecations and additions, improved API behavior, new metrics…

Presenters

Lianet Magrans, David Jacot

Breakout Session

May 20

Stream Processing and Cascading Materialized Views: Why, How, and What

Materialized views (MV) are a core concept in databases. In streaming databases like KsqlDB and RisingWave, MVs are maintained through continuous incremental stream processing engines. Users can define cascading MVs, or more specifically, MVs on top of other MVs, to express complex stream processing logic. However, the management of cascading MVs can introduce substantial technical hurdles for the database system. To illustrate, consider the scenario where an MV within the stack is unable to promptly process events from its upstream sources. This not only results in immediate spikes in latency for downstream MVs but also creates backpressure, potentially causing a system crash. Additionally, if an MV experiences a crash, it can trigger a pause in the entire MV stack's processing. Overcoming these challenges to recover the MV and its downstream MVs while preserving data consistency is a formidable task. In this presentation, I will begin by exploring the critical considerations when it comes to maintaining cascading materialized views: namely, consistency, elasticity, and fault tolerance. Subsequently, I will delve into the potential advantages and disadvantages of various approaches, along with strategies for efficient logging and checkpointing to minimize system downtime. Finally, I will share insights gained from our experiences in managing hundreds of cascading materialized views in real-world production environments.

Presenters

Yingjun Wu

Breakout Session

May 20

Why Kafka is always late? Is that really a problem?

Kafka is fast, but lag is everywhere. Data falls behind, consumers can’t keep up, and alerts keep firing. The usual reaction? Blame Kafka. The real issue? Kafka does exactly what it’s built to do: decouple producers and consumers. Lag isn’t a bug, it’s a side effect. Tracking offsets won’t save you. The real problem is time lag: the gap between when data is produced and when it’s actually processed. Consumer rebalances, inefficient commits, slow APIs, and bad scaling decisions all make it worse. Little’s Law predicts when lag will spiral, but most teams ignore it. This talk breaks down what’s really happening when Kafka "falls behind", why, and what you can do about it. Batching, commit strategies, parallel consumption, dropping messages, many options are available. Start controlling lag before it controls you.

Presenters

Stephane Derosiaux

Breakout Session

May 20

The art of structuring real-time data streams into actionable insights

Detecting problems as they happen is essential in today’s fast-moving world. This talk shows how to build a simple, powerful system for real-time anomaly detection. We’ll use Apache Kafka for streaming data, Apache Flink for processing it, and AI to find unusual patterns. Whether it’s spotting fraud, monitoring systems, or tracking IoT devices, this solution is flexible and reliable. First, we’ll explain how Kafka helps collect and manage fast-moving data. Then, we’ll show how Flink processes this data in real time to detect events as they happen. We’ll also explore how to add AI to the pipeline, using pre-trained models to find anomalies with high accuracy. Finally, we’ll look at how Apache Iceberg can store past data for analysis and model improvements. Combining real-time detection with historical data makes the system smarter and more effective over time. This talk includes clear examples and practical steps to help you build your own pipeline. It’s perfect for anyone who wants to learn how to use open-source tools to spot problems in real-time data streams.

Presenters

Olena Kutsenko

Breakout Session

May 20