Current London 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


Let's add Kafka support to the Kubernetes Gateway API.
The Kubernetes Gateway API is the preferred method for specifying how traffic flows both from clients outside a Kubernetes cluster to services running inside the cluster (aka north/south traffic), as well as how services can communicate inside a cluster (aka east/west traffic). When vendors support the standard, end-users reap the benefits such as portability and reduced vendor lock-in. The Kubernetes Gateway API, like the rest of Kubernetes, is under the governance of the Cloud Native Computing Foundation (CNCF), which in turn is part of the Linux Foundation. Today, the Gateway API includes standard ways to define HTTP and gRPC traffic into and within a Kubernetes cluster, with experimental work under way for TLS, TCP and UDP traffic. For HTTP, this means for example that given any incoming HTTP request, you can define filters, transformations, and routing rules that are applied before the request is passed to its final destination in the cluster. In this talk, I argue that event-driven architectures deserve the same treatment. Organisations want to unlock the data in Kafka, which puts pressure on Kafka admins that need to expose data to additional internal and external clients while maintaining strong governance. However there isn't a standard way to safely expose Kafka to clients at the scale and speed required by businesses. Existing Kubernetes solutions like the TCP support in the Gateway API are helpful but are not Kafka protocol-aware. In this talk, I’ll explain a new proposal for a Kafka extension to the Kubernetes Gateway API standard. This proposal makes it very easy for Kubernetes and Kafka administrators to manage access to their Kafka clusters in a cloud-native way. Kafka can even be securely exposed to consumers outside of the Kubernetes cluster, which opens new doors and ways of leveraging the valuable data within. We’ll review early implementations that support this initiative.
Jonathan Michaux


The Silent Migration: How Kafka Streams Became Our Safety Net
Migrating from a monolithic Postgres system to a distributed architecture is a high-stakes balancing act. Over five years, we transformed our legacy infrastructure, with Kafka Streams emerging as the backbone bridging old and modern systems, ensuring uninterrupted compliance, real-time reporting, and ML-driven insights. This talk details how we collaborated across legacy teams, new service developers, external partners, and ML engineers to build a resilient streaming platform. Our layered Kafka Streams topologies served as a universal abstraction layer, addressing key challenges: - Orchestrating Cross-Team Workflows: Legacy monoliths (using CDC with Debezium), Kafka-based new services, and external systems often produced conflicting schemas. We unified these data streams, enabling downstream innovation without tight coupling to source systems. - Simplifying Operations: To manage tens of complex topologies, we developed internal tools for automated topology validation, state store monitoring, simplified replays, and efficient debugging, significantly reducing new engineer onboarding time. - Compliance at Streaming Speed: Processing every transaction through Kafka Streams allowed us to implement real-time compliance checks with sub-100ms latency. This stream-first approach cut regulatory implementation time from weeks to days without altering legacy systems. - Reporting & Machine Learning: Integrating with Databricks, we converted real-time streams into batch-compatible datasets using Spark Structured Streaming and Delta tables for sub-minute processing. Our pipeline also enabled real-time feature engineering, enhancing ML model performance for recommendations and risk scoring. The target audience is data engineers, architects, and team leads tackling legacy modernization, cross-team collaboration, and real-time analytics. Attendees will learn strategies to align priorities, accelerate compliance, and unify real-time and batch pipelines for reporting and ML.
Nemanja Milicevic


Continuous forecasting and anomaly detection with Flink SQL
Confluent Managed solution for Apache Flink is expanding its analytical capabilities with the introduction of ML_FORECAST and ML_ANOMALY_DETECTION functions. Developers can now harness the power of established models like ARIMA for continuous forecasting and anomaly detection, all within the familiar SQL interface. This advancement eliminates the need for external ML services and enables continuous processing by embedding these analytical capabilities directly in your streaming pipeline. In this 20-minute session, tailored for developers with stream processing experience, we'll explore how to integrate sophisticated time series analysis into Flink SQL applications. We'll start by introducing the newly developed ML_FORECAST function, which brings ARIMA modeling capabilities to streaming data. We'll then demonstrate the ML_ANOMALY_DETECTION function and show how it can be combined with Kafka-sourced data streams for real-time anomaly detection. Finally, we'll build a complete streaming application that combines both functions to forecast metrics and detect anomalies in a continuous manner. By the end of the session, attendees will understand how to leverage these powerful new functions to build production-ready continuous forecasting and anomaly detection systems using just Flink SQL.
Siddharth Bedekar


Unlocking the Mysteries of Apache Flink
Apache Flink has grown to be a large, complex piece of software that does one thing extremely well: it supports a wide range of stream processing applications with difficult-to-satisfy demands for scalability, high performance, and fault tolerance, all while managing large amounts of application state. Flink owes its success to its adherence to some well-chosen design principles. But many software developers have never worked with a framework organized this way, and struggle to adapt their application ideas to the constraints imposed by Flink's architecture. After helping thousands of developers get started with Flink, I've seen that once you learn to appreciate why Flink's APIs are organized the way they are, it becomes easier to relax and accept what its developers have intended, and to organize your applications accordingly. The key to demystifying Apache Flink is to understand how the combination of stream processing plus application state has influenced its design and APIs. A framework that cares only about batch processing would be much simpler than Flink, and the same would be true for a stream processing framework without support for state. In this talk I will explain how Flink's managed state is organized in its state backends, and how this relates to the programming model exposed by its APIs. We'll look at checkpointing: how it works, the correctness guarantees that Flink offers, how state snapshots are organized, and what happens during recovery and rescaling. We'll also look at watermarking, which is a major source of complexity and confusion for new Flink developers. Watermarking epitomizes the requirement Flink has to manage application state in a way that doesn't explode as those applications run continuously on unbounded streams. This talk will give you a mental model for understanding Apache Flink. I'll conclude by explaining how these concepts that govern the implementation of Flink's runtime have shaped the design of Flink's SQL API.
David Anderson


Streaming with Apache Iceberg
Streaming data is a critical component of modern data architectures. This talk explores how to determine your streaming needs and design a robust solution using Apache Iceberg, a next-generation table format built for flexibility and scalability. We’ll dive into the foundational tools that enable streaming pipelines, including Apache Flink, Apache Kafka, Debezium, Kafka Connect, and Apache Spark, breaking down their roles and use cases in processing, transporting, and transforming streaming data. The talk will also highlight Iceberg-specific considerations, such as managing compaction to optimize query performance and dealing with delete files for handling record-level updates and deletes. Whether you’re building real-time analytics, powering machine learning models, or streaming raw data into your data lakehouse, this session will provide actionable insights and best practices for building reliable and efficient streaming workflows with Apache Iceberg.
Will Martin


Kafka productivity tools in the age of AI
Almost overnight, AI has rewritten the modern tech stack. At the top of the stack, Cursor, CoPilot, and Claude can now be found in most developer IDEs. At the bottom, foundational models like o1, Llama, and Gemini increasingly power backend business logic. What does that mean for everything else in the middle, like developer tools? And what does that especially mean for developers who need to be productive in managing, operating, and testing Kafka and its applications? Whether you use Flink, Confluent, WarpStream, or whatever else, attendees of this talk will learn an approach to Kafka tooling that balances short-term AI gains with long-term engineering best practices.
Michael Drogalis