Current London 2025

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

Bite Size Streams: Learning Advanced Kafka Streams Concepts One Simple Topology at a Time

Event streaming with Kafka Streams is powerful but can feel overwhelming to understand and implement. Breaking down advanced concepts into smaller single-purpose topologies makes learning more approachable. Kafka Streams concepts will be introduced with an interactive web application that allows you to visualize input topics, output topics, changelog topics, state stores, and more. What happens when state store caching is disabled? What if topology optimization is enabled? Or what if stream time isn't advanced? These questions will be easily explored by visualizing the topology and Kafka Streams configurations. This interactive tutorial's real-time events are generated by actual data on your laptop, including running processes, thread details, windows, services, and user sessions. Moving a window on your laptop can trigger many examples, allowing you to see how the topology handles them. The audience will select from an interactive poll of concepts to cover for the session, selecting from concepts on branching, emitting on change, windowing, repartitioning, joining, and more. Join me on this journey of learning Kafka Streams. You'll deepen your understanding of Kafka Streams concepts and gain access to tools that let you explore advanced concepts independently. All examples and visualization will be available in an open-source project.

Presenters

Neil Buesing

Breakout Session

May 20

Ins and Outs of The Outbox Pattern

The outbox pattern is a common solution for implementing data flows between microservices. By channeling messages through an outbox table, it enables services to update their own local datastore and at the same time send out notifications to other services via data streaming platforms such as Apache Kafka, in a reliable and consistent way. However, as with everything in IT, there’s no free lunch. How to handle backfills of outbox events, how to ensure idempotency for event consumers? Doesn’t the pattern cause the database to become a bottleneck? And what about alternatives such as “Listen-to-Yourself”, or the upcoming Kafka support for 2-phase commit transactions (KIP-939)? It’s time to take another look at the outbox pattern! In this session I’ll start by bringing you up to speed on what the outbox pattern *is*, and then go on to discuss more details such as: - Implementing the pattern safely and efficiently - Its semantics, pros and cons - Dealing with backfills - Potential alternatives to the outbox pattern and the trade-offs they make

Presenters

Gunnar Morling

Breakout Session

May 20

Flink Jobs as Agents 🤖 – Unlocking Agentic AI with Stream Processing

Apache Flink is uniquely positioned to serve as the backbone for AI agents, enhancing them with stream processing as a new, powerful tool. We’ll explore how Flink jobs can be transformed into autonomous, goal-driven "Agents" that interact with data streams, trigger actions, and adapt in real time. We’ll showcase Flink jobs as AI agents through two key stream processing & AI use cases: 1) financial planning & detection of spending anomalies, as well as 2) forecasting demand & supply chain monitoring for disruptions. AI agents need business context. We’ll discuss embedding foundation models with schema registries and data catalogs for contextual intelligence while ensuring data governance and security. We’ll integrate Apache Kafka event streams with data lakes in open-table formats like Apache Iceberg, enabling AI agents to leverage real-time and historical data for consistency and reasoning. We’ll also cover latency optimization for time-sensitive use cases while preventing hallucinations. Finally, we’ll demonstrate an open-source conversational platform on Apache Kafka, where multiple AI agents are assigned to a business process, continuously process real-time events while optimizing for their individual goals, interacting, and negotiating with each other. By combining Flink and Kafka, we can build systems that are not just reactive but proactive and predictive, paving the way for next-generation agentic AI.

Presenters

Steffen Hoellinger

Breakout Session

May 20

Scaling Semantic Search: Apache Kafka Meets Vector Databases

This talk presents a performance-tuned Apache Kafka pipeline for generating embeddings on large-scale text data streams. To store embeddings, our implementation supports various vector databases, making it highly adaptable to many applications. Text embeddings are fundamental for semantic search and recommendation, representing text in high-dimensional vector spaces for efficient similarity search using approximate k-nearest neighbors (kNN). By storing these embeddings and providing semantic search results given a query, vector databases are central to retrieval-augmented generation systems. We present our Kafka pipeline for continuously embedding texts to enable semantic search on live data. We demonstrate its end-to-end implementation while addressing key technical challenges: - First, the pipeline performs text chunking to adhere to the maximum input sequence length of the embedding model. We use an optimized overlapping text chunking strategy to ensure that context is maintained across chunks. - Using HuggingFace’s Text Embeddings Inference (TEI) toolkit in a lightweight, containerized GPU environment, we achieve efficient, scalable text embedding computation. TEI supports a wide range of state-of-the-art embedding models. - As an alternative to relying on Kafka Streams, our solution implements optimized processing of small batches using Kafka consumer and producer client APIs, allowing batched API calls to TEI. Our benchmark results confirm this choice, indicating high efficiency with significantly improved throughput and reduced latency compared to other approaches. - Finally, Kafka Connect allows real-time ingestion into vector databases like Qdrant, Milvus, or Vespa, making embeddings instantly available for semantic search and recommendation. With Kafka’s high-throughput streaming, optimized interactions with GPU-accelerated TEI, and efficient vector serialization, our pipeline achieves scalable embedding computation and ingestion into vector databases.

Presenters

Jakob Edding, Raphael Lachtner

Breakout Session

May 20

Zero to Data Streaming Platform in Under 15 Minutes

Data streaming engineers need tooling to efficiently provision, maintain, and evolve the data stream platform. The Confluent Terraform Provider does just that, providing human-readable infrastructure-as-code to build a Confluent Cloud environment in a matter of minutes. In this session, we’ll start from a blank canvas and create a new environment - complete with an Apache KafkaⓇ cluster, stream governance, and processing with Flink. Next we’ll create Kafka topics, define data contracts and determine how to transform our input data. We won’t forget about security and access controls - so let’s create service accounts with the necessary roles and permissions. Finally, we’ll set it all in motion by streaming events into Kafka and querying the output of our new data pipeline. When we’re done, you’ll have the tools needed to build and maintain your data streaming platform. Let’s do this!

Presenters

Sandon Jacobs

Breakout Session

May 20

Building Stream Processing Platform at OpenAI

Curious about how OpenAI leverages Apache Flink for real-time data processing? In this session, we will dive into the technical intricacies of building the Flink platform at OpenAI. We’ll walk you through our Flink infrastructure setup—including deployment strategies, integration with Kafka, and our multi-region architecture. Additionally, we’ll explore how we’ve enhanced PyFlink to operate effectively at our scale. Finally, we’ll discuss the challenges we face, share our strategies for overcoming them, and outline our future roadmap.

Presenters

Shuyi Chen

Breakout Session

May 20