Current London 2025

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

Melting Icebergs: Enabling Analytical Access to Kafka Data through Iceberg Projections

An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be? In the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts? Yes you can and we did! This isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools. In this talk, we'll cover: * How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL! * Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more. * Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka. Expect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles: * Kafka as the single source of truth—no separate stores. * Analytical processors shouldn't need Kafka-specific adjustments. * Operational performance must remain uncompromised. * Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented. Join us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end!

Presenters

Tom Scott, Roman Kolesnev

Breakout Session

May 21

Empowering Developers with a Centralized Kafka Library

This presentation details how our platform enablement team developed a centralized Kafka library, empowering developers to build applications with ease. Faced with inconsistent Kafka processing approaches across teams, we created a common library, inspired by the multi-threaded consumer approach described in this Confluent blog post: https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging (re-implemented in Kotlin). We'll share our challenges, successes, and the current state of this library, now used in over 20 services. Initially, varying team approaches to Kafka processing led to inconsistencies and duplicated effort. Our team recognized the need for standardization. Internal library simplifies Kafka development, promotes best practices, and centralizes key functionalities. It wraps the kafka-clients library, offering simple interfaces for building Kafka consumers and producers that integrate with our Confluent clusters, schema registry, and Avro serialization. Core feature is its multi-threaded consumer implementation, enabling efficient consumption from multiple partitions. We'll share the technical hurdles we encountered during development, discussing our design decisions, multi-threading challenges, and lessons learned. Crucially, the library supports dead-letter queues and message redelivery. It also supports cross-cluster consumers, essential for GDPR compliance, allowing production to multiple Confluent clusters in different regions. We'll cover our versioning strategy and package overlap issues, explaining how we created thin, relocated, and uber JAR versions. Interesting feature is runtime consumer control. By producing events to an internal topic, we can start/stop consumers in live applications without redeployment. The library has simplified Kafka development, promoted consistency, reduced the learning curve, and centralized core functionality. This presentation is ideal for Kafka developers seeking to build internal Kafka libraries.

Presenters

Ademir Spahic, Ammar Latifovic

Breakout Session

May 21

Towards Transactional Buffering of Change Data Capture Events

Data pipelines built on top of change data capture (CDC) are gaining ever more traction and power many different real-time applications these days. The standard way CDC solutions operate is to propagate captured data changes as separate events, which are typically consumed one by one and as is by downstream systems. In this talk, we are taking a deep dive to explore CDC pipelines for transactional systems to understand how the direct consumption of individually published CDC events impacts data consistency at the sink side of data flows. In particular, we'll learn why the lack of transactional boundaries in change event streams may well lead to temporarily inconsistent state—such as partial updates from multi-table transactions—that never existed in the source database. A promising solution to mitigate this issue is aggregating CDC events based on their original transactional context. To demonstrate the practical aspects of this approach, we'll go through a concrete end-to-end example showing: * how to configure Debezium to enrich captured change events from a relational database with transaction-related metadata * an experimental Apache Flink stream processing job to buffer CDC events based on transactional boundaries * a bespoke downstream consumer to atomically apply transactional CDC event buffers into a target system If you have ever wondered how to tackle the often-neglected problem of temporarily inconsistent state when consuming change event streams originating from relational databases, this session is for you!

Presenters

Hans-Peter Grahsl

Breakout Session

May 21

Unlocking Next-Gen Stateful Streaming: Harnessing transformWithState in Apache Spark with Kafka

As event-driven architectures powered by Apache Kafka™ continue to redefine real-time data processing, the demand for flexible, scalable, and efficient stateful streaming solutions has never been higher. Enter transformWithState, Apache Spark™’s groundbreaking new operator for Structured Streaming, designed to tackle the complexities of stateful processing head-on. In this session, we’ll dive into how transformWithState empowers developers to build sophisticated, low-latency streaming applications with Kafka as the backbone. From flexible state management and timer-driven logic to seamless schema evolution and integration with Kafka, we’ll explore real-world use cases—like real-time fraud detection and session-based analytics—that showcase its power. Attendees will leave with a clear understanding of how to leverage transformWithState to supercharge their Kafka-powered Spark pipelines, complete with practical examples, performance insights, and best practices for production deployment. Whether you’re optimizing stateful aggregations or chaining complex event-driven workflows, this talk will equip you to push the boundaries of what’s possible with Kafka and Spark.

Presenters

Holly Smith, Craig Lukasik

Breakout Session

May 21

Flink SQL Revolutions: Breaking Out of the Matrix with PTFs

SQL is more than just "a language"; it is the product of tough negotiations among industry leaders, generations of engineers, and data experts. It has evolved into an ecosystem that has proven to survive a constantly changing IT landscape. Yet, many believe that SQL alone is not enough for the instant world we live in today. This world is driven by stream processing and event-driven applications, which demand complex state machines, special types of joins, rule-based conditional logic, and time-based decision-making. Apache Flink was designed exactly for these use cases. However, it seems that Flink SQL users have been somewhat left behind. This has not only hindered the adoption of Flink SQL but also prevented interested users from leveraging a powerful CDC engine. The upcoming Flink version will change this significantly. Process Table Functions (PTFs) defined in FLIP-440 are the solution. These functions accept entire tables as arguments and are equipped to handle streaming primitives such as time, timers, and state. In this talk, I will showcase the full potential of PTFs and how they could replace microservices consuming data from Kafka. This will start a Flink SQL revolution, all while adhering to recent additions to the SQL ISO standard.

Presenters

Timo Walther

Breakout Session

May 21

Towards an Open Apache Kafka Performance Model

Imagine having a powerful, customizable model that brings the end-to-end flow of records through your Kafka applications and clusters to life. Picture a tool that allows you to swiftly and affordably understand and predict the performance, scalability, and resource demands of your entire system. With this model, you can explore “what if” scenarios, such as changes to workloads, application and cluster hardware, Kafka configurations, and even dependencies on external systems. This vision is closer than you think. In this talk, we’ll introduce a simple Kafka performance model and demonstrate its application to Kafka tiered storage sizing. Whether you’re using SSD or EBS local storage or S3 remote storage, this model can predict IO, network requirements, the size and number of brokers, and storage space needs. But this is just the beginning. We’ll unveil the potential of a fully-featured open Kafka performance model. Discover how it could work, what it could do, and the approaches we’re investigating to build and parameterize it. These include benchmarking workloads separately, applying multivariate regression over metrics from our largest managed Kafka clusters, leveraging Kafka client metrics (KIP-714), and utilizing OpenTelemetry traces. For visualization, we’re exploring Sankey Diagrams and integrating OpenTelemetry data into an open-source GUI. Our goal is to democratize access to an open Kafka performance model, empowering anyone using, developing, or running Apache Kafka clusters and applications. This model will help predict end-to-end application performance, client and cluster resources, and performance SLAs. It will also aid in capacity planning, cluster sizing/re-sizing, and understanding dynamic changes for variable workloads, elastic cluster resizing, cluster failures, maintenance operations, and more. The scope could even expand to include Kafka stream processing, multiple clusters, and heterogeneous integration scenarios with Kafka Connect.

Presenters

Paul Brebner

Breakout Session

May 21