Current London 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


Towards Transactional Buffering of Change Data Capture Events
Data pipelines built on top of change data capture (CDC) are gaining ever more traction and power many different real-time applications these days. The standard way CDC solutions operate is to propagate captured data changes as separate events, which are typically consumed one by one and as is by downstream systems. In this talk, we are taking a deep dive to explore CDC pipelines for transactional systems to understand how the direct consumption of individually published CDC events impacts data consistency at the sink side of data flows. In particular, we'll learn why the lack of transactional boundaries in change event streams may well lead to temporarily inconsistent state—such as partial updates from multi-table transactions—that never existed in the source database. A promising solution to mitigate this issue is aggregating CDC events based on their original transactional context. To demonstrate the practical aspects of this approach, we'll go through a concrete end-to-end example showing: * how to configure Debezium to enrich captured change events from a relational database with transaction-related metadata * an experimental Apache Flink stream processing job to buffer CDC events based on transactional boundaries * a bespoke downstream consumer to atomically apply transactional CDC event buffers into a target system If you have ever wondered how to tackle the often-neglected problem of temporarily inconsistent state when consuming change event streams originating from relational databases, this session is for you!
Hans-Peter Grahsl


Unlocking Next-Gen Stateful Streaming: Harnessing transformWithState in Apache Spark with Kafka
As event-driven architectures powered by Apache Kafka™ continue to redefine real-time data processing, the demand for flexible, scalable, and efficient stateful streaming solutions has never been higher. Enter transformWithState, Apache Spark™’s groundbreaking new operator for Structured Streaming, designed to tackle the complexities of stateful processing head-on. In this session, we’ll dive into how transformWithState empowers developers to build sophisticated, low-latency streaming applications with Kafka as the backbone. From flexible state management and timer-driven logic to seamless schema evolution and integration with Kafka, we’ll explore real-world use cases—like real-time fraud detection and session-based analytics—that showcase its power. Attendees will leave with a clear understanding of how to leverage transformWithState to supercharge their Kafka-powered Spark pipelines, complete with practical examples, performance insights, and best practices for production deployment. Whether you’re optimizing stateful aggregations or chaining complex event-driven workflows, this talk will equip you to push the boundaries of what’s possible with Kafka and Spark.
Holly Smith, Craig Lukasik


Flink SQL Revolutions: Breaking Out of the Matrix with PTFs
SQL is more than just "a language"; it is the product of tough negotiations among industry leaders, generations of engineers, and data experts. It has evolved into an ecosystem that has proven to survive a constantly changing IT landscape. Yet, many believe that SQL alone is not enough for the instant world we live in today. This world is driven by stream processing and event-driven applications, which demand complex state machines, special types of joins, rule-based conditional logic, and time-based decision-making. Apache Flink was designed exactly for these use cases. However, it seems that Flink SQL users have been somewhat left behind. This has not only hindered the adoption of Flink SQL but also prevented interested users from leveraging a powerful CDC engine. The upcoming Flink version will change this significantly. Process Table Functions (PTFs) defined in FLIP-440 are the solution. These functions accept entire tables as arguments and are equipped to handle streaming primitives such as time, timers, and state. In this talk, I will showcase the full potential of PTFs and how they could replace microservices consuming data from Kafka. This will start a Flink SQL revolution, all while adhering to recent additions to the SQL ISO standard.
Timo Walther


Towards an Open Apache Kafka Performance Model
Imagine having a powerful, customizable model that brings the end-to-end flow of records through your Kafka applications and clusters to life. Picture a tool that allows you to swiftly and affordably understand and predict the performance, scalability, and resource demands of your entire system. With this model, you can explore “what if” scenarios, such as changes to workloads, application and cluster hardware, Kafka configurations, and even dependencies on external systems. This vision is closer than you think. In this talk, we’ll introduce a simple Kafka performance model and demonstrate its application to Kafka tiered storage sizing. Whether you’re using SSD or EBS local storage or S3 remote storage, this model can predict IO, network requirements, the size and number of brokers, and storage space needs. But this is just the beginning. We’ll unveil the potential of a fully-featured open Kafka performance model. Discover how it could work, what it could do, and the approaches we’re investigating to build and parameterize it. These include benchmarking workloads separately, applying multivariate regression over metrics from our largest managed Kafka clusters, leveraging Kafka client metrics (KIP-714), and utilizing OpenTelemetry traces. For visualization, we’re exploring Sankey Diagrams and integrating OpenTelemetry data into an open-source GUI. Our goal is to democratize access to an open Kafka performance model, empowering anyone using, developing, or running Apache Kafka clusters and applications. This model will help predict end-to-end application performance, client and cluster resources, and performance SLAs. It will also aid in capacity planning, cluster sizing/re-sizing, and understanding dynamic changes for variable workloads, elastic cluster resizing, cluster failures, maintenance operations, and more. The scope could even expand to include Kafka stream processing, multiple clusters, and heterogeneous integration scenarios with Kafka Connect.
Paul Brebner


Real-Time AI, Done Right: The Power of Kafka & Delta Live Tables
AI thrives on real-time, high-quality data—but most streaming pipelines are fragile, costly, and overly complex. Messy schemas, dropped records, and late-arriving data wreak havoc on AI models, leaving engineers trapped in endless firefighting. So how do you bridge the gap between cutting-edge AI and the chaos of real-time event streams? In this session, we’ll show you how to build AI-ready, self-healing data streams with Databricks Delta Live Tables (DLT) and Confluent Kafka. Learn how to automate schema evolution, enforce data quality with expectations, and optimize pipelines with serverless compute. We’ll then explore the next evolution—AI-powered streaming—leveraging AI Functions, Foundational Models, and Agentic Frameworks to unlock real-time AI at scale. Whether you’re an engineer, data scientist, or architect, you’ll leave with actionable strategies to fuel AI models with pristine, real-time data. Don’t let bad pipelines hold back great AI—upgrade your streaming game today!
Simon Whiteley