Current London 2025

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

Leave Your Passwords Behind: Embracing mTLS in Kafka

Authenticating users is crucial in every production Kafka deployment. Apache Kafka ships with diverse authentication options, including password based SASL mechanisms and mTLS. As computing workloads adopt identities in the form of short-lived X.509 certificates, using them for mTLS offers significant advantages over passwords as they limit the impact of a credential leak and cannot be brute-forced. This talk starts by looking into how authentication works in Kafka and different configurations to customise it. We'll cover challenges faced when migrating users to mTLS and review options to minimise the operational effort. Then, we will share an approach that adds support for mTLS on the SASL listener so users can continue using their existing KafkaPrincipal and fallback to passwords seamlessly during the migration, giving cluster administrators and users confidence before moving away from SASL. Finally we will talk about how enabling Kafka brokers to serve distinct server and client certificates supports adoption of mTLS for inter-broker communication, and the learnings and pitfalls of rolling this out in the fleet.

Presenters

Gaurav Narula

Breakout Session

May 20

Tableflow: Not Just Another Kafka-to-Iceberg Connector!

Ingesting data from Apache Kafka into Apache Iceberg presents a recurring challenge in modern ETL workflows. The conventional approach relies on connectors, yet this method introduces operational hurdles due to the fundamental differences between these systems. Kafka excels at real-time streaming workloads, while Iceberg is optimized for analytical data storage and batch ingestion. Bridging these paradigms creates several inefficiencies: 1. Batch Operations on Streaming Storage: Attempting batch operations on Kafka, a system designed for streaming, results in ingestion bottlenecks and increased strain on Kafka brokers. One example is initial table hydration, where historical data retrieval often means uncached reads. This significantly delays topic-to-table hydration, impacting broker performance and straining resources in latency-sensitive environments. 2. Streaming Operations on Batch Storage: Applying streaming-like ingestion patterns to Iceberg generates numerous small Parquet files. These files pollute Iceberg’s metadata, degrade query performance, and increase the need for maintenance operations. 3. Lack of Unified Table Maintenance: Aggressive creation of small files containing updates will conflict with maintenance operations running in the background, leading to wasteful retries. In this talk, Alex will share insights and lessons learned from building Tableflow, a unified batch/streaming storage system that allowed us to address all three. He will talk about specific solutions implemented in the Kora storage engine that mitigate these issues, making both systems work cohesively. Attendees will gain actionable knowledge on overcoming operational challenges, implementing innovative solutions, and designing scalable pipelines that maximize the potential of both Kafka and Iceberg.

Presenters

Alex Sorokoumov

Breakout Session

May 20

Flink, Kafka and Prometheus: better together to improve efficiency of your observability platform

Prometheus has become the go-to solution for monitoring and alerting, ingesting metrics from applications and infrastructure. The ability to efficiently store high volumes of dimensional time series also makes Prometheus a perfect fit for broader operational analytics use cases. Examples include observing fleets of IoT devices, connected vehicles, media streaming devices, and any distributed resources. However, the high cardinality and frequency of events generated by these sources can be challenging. Apache Flink can preprocess observability events in real-time before writing to Prometheus. Reducing cardinality or frequency can improve the efficiency of your observability platform. Adding contextual information and calculating derived metrics enables deeper operational analysis in real time. Observing Flink with Prometheus is a solved problem, using Flink Prometheus Exporters. The new Flink-Prometheus connector, a recent addition to the Apache Flink connector family, addresses a different challenge. It enables using Flink to preprocess large volumes of observability data from various sources and write directly to Prometheus at scale. Kafka completes this architecture by providing reliable stream storage, ensuring ordered delivery of high-volume raw metrics into Flink—critical for maintaining Prometheus time series integrity In this talk, an Apache Flink committer and the maintainer of the new Flink-Prometheus connector will explore real-world use cases, key challenges, and best practices to leverage Flink and Prometheus together to supercharge your observability platform.

Presenters

Lorenzo Nicora

Breakout Session

May 20

Optimizing Kafka Streams Joins for Business-Critical Processes

In this session, we will delve into the practical boundaries of the Kafka Streams DSL and showcase why the Processor API stands out as the ultimate tool for addressing complex streaming scenarios. Using Michelin’s tire delivery process as a guiding example, we will illustrate how the different join types can be implemented with the DSL and where its limitations begin to emerge. The challenges of joining events from multiple topics, whether driven by event-based or time-based logic, and achieving fine-grained control over state stores led us to embrace the Processor API. While the DSL is convenient and expressive for many use cases, the Processor API consistently proves to be the most powerful solution for real-world applications requiring precision and flexibility. Whether you’re an architect, developer, or Kafka enthusiast, this session will equip you with actionable insights into designing custom state stores, optimizing for low latency, and implementing adaptable join logic to meet evolving business needs. Rather than advocating for abandoning the DSL entirely, the session highlights the importance of recognizing its limitations and understanding why the Processor API is often worth the additional effort.

Presenters

Sébastien Viale, Adam Souquieres

Breakout Session

May 20

One Client to Rule them all

Let’s be honest: who wants to have more than one client to connect to a data system? Now consider Apache Kafka. It ships with four different Java clients: producer, consumer, admin, and streams. Want to create a topic in a producer application? Use the admin client and the producer client. Want to produce and consume? Either use the producer and the consumer, or use Kafka Streams. So how did we get here? And more importantly: how can we simplify it? Are incremental improvements enough? In this talk, we’ll propose a radical approach: a single unified Java client built from scratch for producing, consuming, processing, and administration tasks. We take you on a brainstorming session about what we can and cannot do, and what we want to achieve. How can we make simple things easy and difficult things possible? What does a modern Java API look like, using the standard library, a reasonable threading model, lambdas, and futures for async calls? We think it's high time that we take another look at the Java clients and build a client ready for the next decade. Come and join the conversation about the future of Kafka clients.

Presenters

Matthias J Sax, Andrew Schofield

Breakout Session

May 20

FlinkSQL Powered Asynchronous Data Processing in Pinterest’s Rule Engine Platform

Pinterest rule engine platform, also known as Guardian, allows Subject Matter Experts (SMEs) to analyze real time event streams for patterns of abuse and create rules to block those patterns. Guardian addresses various domain-specific challenges, including spam / fraud enforcement, Media Research Council (MRC), account takeover attacks (ATO), risk monitoring, and unsafe content enforcement fanout, etc. However, the legacy Guardian platform was built under a monolithic architecture and is unable to keep up with the data scale and the increasing demands and risks faced by stakeholders. To tackle these challenges, we redesigned next-gen Guardian with event-driven architecture by choosing FlinkSQL for scalable event processing and integrating with various data storage systems like Kafka, Starrocks, Iceberg and internal KVstore that cater to specific data access requirements. In this talk, we would like to share the design and learnings of building the new system. Specifically, we’ll focus on how FlinkSQL interacts with different storage systems and how FlinkSQL is leveraged to support asynchronous data processing needs, including stream splitting & pruning, data ingestion, rule enforcement and rewind & replay. Our revamped architecture has yielded significant improvements in scalability, efficiency, development velocity and data compliance. Additionally, we will touch base some ongoing efforts on safe schema evolution, which have become more challenging under the event-driven design with various storage systems and FlinkSQL introduced.

Presenters

Sharon Xie, Heng Zhang

Breakout Session

May 20