Current London 2025

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

The Silent Migration: How Kafka Streams Became Our Safety Net

Migrating from a monolithic Postgres system to a distributed architecture is a high-stakes balancing act. Over five years, we transformed our legacy infrastructure, with Kafka Streams emerging as the backbone bridging old and modern systems, ensuring uninterrupted compliance, real-time reporting, and ML-driven insights. This talk details how we collaborated across legacy teams, new service developers, external partners, and ML engineers to build a resilient streaming platform. Our layered Kafka Streams topologies served as a universal abstraction layer, addressing key challenges: - Orchestrating Cross-Team Workflows: Legacy monoliths (using CDC with Debezium), Kafka-based new services, and external systems often produced conflicting schemas. We unified these data streams, enabling downstream innovation without tight coupling to source systems. - Simplifying Operations: To manage tens of complex topologies, we developed internal tools for automated topology validation, state store monitoring, simplified replays, and efficient debugging, significantly reducing new engineer onboarding time. - Compliance at Streaming Speed: Processing every transaction through Kafka Streams allowed us to implement real-time compliance checks with sub-100ms latency. This stream-first approach cut regulatory implementation time from weeks to days without altering legacy systems. - Reporting & Machine Learning: Integrating with Databricks, we converted real-time streams into batch-compatible datasets using Spark Structured Streaming and Delta tables for sub-minute processing. Our pipeline also enabled real-time feature engineering, enhancing ML model performance for recommendations and risk scoring. The target audience is data engineers, architects, and team leads tackling legacy modernization, cross-team collaboration, and real-time analytics. Attendees will learn strategies to align priorities, accelerate compliance, and unify real-time and batch pipelines for reporting and ML.

Presenters

Nemanja Milicevic

Lightning Talk

May 20

Continuous forecasting and anomaly detection with Flink SQL

Confluent Managed solution for Apache Flink is expanding its analytical capabilities with the introduction of ML_FORECAST and ML_ANOMALY_DETECTION functions. Developers can now harness the power of established models like ARIMA for continuous forecasting and anomaly detection, all within the familiar SQL interface. This advancement eliminates the need for external ML services and enables continuous processing by embedding these analytical capabilities directly in your streaming pipeline. In this 20-minute session, tailored for developers with stream processing experience, we'll explore how to integrate sophisticated time series analysis into Flink SQL applications. We'll start by introducing the newly developed ML_FORECAST function, which brings ARIMA modeling capabilities to streaming data. We'll then demonstrate the ML_ANOMALY_DETECTION function and show how it can be combined with Kafka-sourced data streams for real-time anomaly detection. Finally, we'll build a complete streaming application that combines both functions to forecast metrics and detect anomalies in a continuous manner. By the end of the session, attendees will understand how to leverage these powerful new functions to build production-ready continuous forecasting and anomaly detection systems using just Flink SQL.

Presenters

Siddharth Bedekar

Breakout Session

May 20

Unlocking the Mysteries of Apache Flink

Apache Flink has grown to be a large, complex piece of software that does one thing extremely well: it supports a wide range of stream processing applications with difficult-to-satisfy demands for scalability, high performance, and fault tolerance, all while managing large amounts of application state. Flink owes its success to its adherence to some well-chosen design principles. But many software developers have never worked with a framework organized this way, and struggle to adapt their application ideas to the constraints imposed by Flink's architecture. After helping thousands of developers get started with Flink, I've seen that once you learn to appreciate why Flink's APIs are organized the way they are, it becomes easier to relax and accept what its developers have intended, and to organize your applications accordingly. The key to demystifying Apache Flink is to understand how the combination of stream processing plus application state has influenced its design and APIs. A framework that cares only about batch processing would be much simpler than Flink, and the same would be true for a stream processing framework without support for state. In this talk I will explain how Flink's managed state is organized in its state backends, and how this relates to the programming model exposed by its APIs. We'll look at checkpointing: how it works, the correctness guarantees that Flink offers, how state snapshots are organized, and what happens during recovery and rescaling. We'll also look at watermarking, which is a major source of complexity and confusion for new Flink developers. Watermarking epitomizes the requirement Flink has to manage application state in a way that doesn't explode as those applications run continuously on unbounded streams. This talk will give you a mental model for understanding Apache Flink. I'll conclude by explaining how these concepts that govern the implementation of Flink's runtime have shaped the design of Flink's SQL API.

Presenters

David Anderson

Breakout Session

May 20

Streaming with Apache Iceberg

Streaming data is a critical component of modern data architectures. This talk explores how to determine your streaming needs and design a robust solution using Apache Iceberg, a next-generation table format built for flexibility and scalability. We’ll dive into the foundational tools that enable streaming pipelines, including Apache Flink, Apache Kafka, Debezium, Kafka Connect, and Apache Spark, breaking down their roles and use cases in processing, transporting, and transforming streaming data. The talk will also highlight Iceberg-specific considerations, such as managing compaction to optimize query performance and dealing with delete files for handling record-level updates and deletes. Whether you’re building real-time analytics, powering machine learning models, or streaming raw data into your data lakehouse, this session will provide actionable insights and best practices for building reliable and efficient streaming workflows with Apache Iceberg.

Presenters

Will Martin

Breakout Session

May 20

Kafka productivity tools in the age of AI

Almost overnight, AI has rewritten the modern tech stack. At the top of the stack, Cursor, CoPilot, and Claude can now be found in most developer IDEs. At the bottom, foundational models like o1, Llama, and Gemini increasingly power backend business logic. What does that mean for everything else in the middle, like developer tools? And what does that especially mean for developers who need to be productive in managing, operating, and testing Kafka and its applications? Whether you use Flink, Confluent, WarpStream, or whatever else, attendees of this talk will learn an approach to Kafka tooling that balances short-term AI gains with long-term engineering best practices.

Presenters

Michael Drogalis

Breakout Session

May 20

Taming the Data Beast: Processing real time Market Data by a Crypto Exchange

As one of Europe’s leading Crypto Exchanges, Bitvavo enables its ~2 million customers to buy, sell and store over 300 digital assets and provides a 24/7 service processing many thousands of transactions per second with stable sub millisecond execution times on its order flow. In this talk I will deep dive on the high level architecture of Bitvavo Exchange and details of how we process and transform trading Data using Confluent Cloud and Imply Druid in real time in order to provide useful insights to our customers focused on Candle Charts. Specifically I will cover architectural patterns, lessons learned and good practices routing and processing high volumes of market data from low latency systems while maintaining the high performance and scalability required from the leading European Crypto Exchange in Europe.

Presenters

Marcos Maia

Breakout Session

May 20