Current London 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


Serverless Transactional Applications Go with the (Data)Flow
Traditional monolithic applications are migrated to the cloud, typically using a microservice-like architecture. Although this migration leads to significant benefits such as scalability and development agility, it also leaves behind the transactional amenities, such as serializability, that database systems have served developers for decades. Today’s transactional cloud applications depart from these database amenities by combining aspects of state management, service messaging, and service coordination in application logic. In this talk, I will present Styx, a novel open-source dataflow-based cloud application runtime that executes scalable, low-latency transactional applications. Cloud applications in Styx can be developed as Stateful Entities: simple objects that can form arbitrary stateful function orchestrations. The Styx runtime takes care of serializable state consistency, exactly-once processing, state and event partitioning, parallelization and scaling. In this session, you will learn how Kafka, with ideas from stateful stream processing and database transactions, can be combined in order to create transactional cloud application runtimes, bringing us back to the 80s: the time when developers did not have to deploy complex technology stacks, but rather author pure business logic and trust the database for the rest.
Asterios Katsifodimos


Bite Size Streams: Learning Advanced Kafka Streams Concepts One Simple Topology at a Time
Event streaming with Kafka Streams is powerful but can feel overwhelming to understand and implement. Breaking down advanced concepts into smaller single-purpose topologies makes learning more approachable. Kafka Streams concepts will be introduced with an interactive web application that allows you to visualize input topics, output topics, changelog topics, state stores, and more. What happens when state store caching is disabled? What if topology optimization is enabled? Or what if stream time isn't advanced? These questions will be easily explored by visualizing the topology and Kafka Streams configurations. This interactive tutorial's real-time events are generated by actual data on your laptop, including running processes, thread details, windows, services, and user sessions. Moving a window on your laptop can trigger many examples, allowing you to see how the topology handles them. The audience will select from an interactive poll of concepts to cover for the session, selecting from concepts on branching, emitting on change, windowing, repartitioning, joining, and more. Join me on this journey of learning Kafka Streams. You'll deepen your understanding of Kafka Streams concepts and gain access to tools that let you explore advanced concepts independently. All examples and visualization will be available in an open-source project.
Neil Buesing


Ins and Outs of The Outbox Pattern
The outbox pattern is a common solution for implementing data flows between microservices. By channeling messages through an outbox table, it enables services to update their own local datastore and at the same time send out notifications to other services via data streaming platforms such as Apache Kafka, in a reliable and consistent way. However, as with everything in IT, there’s no free lunch. How to handle backfills of outbox events, how to ensure idempotency for event consumers? Doesn’t the pattern cause the database to become a bottleneck? And what about alternatives such as “Listen-to-Yourself”, or the upcoming Kafka support for 2-phase commit transactions (KIP-939)? It’s time to take another look at the outbox pattern! In this session I’ll start by bringing you up to speed on what the outbox pattern *is*, and then go on to discuss more details such as: - Implementing the pattern safely and efficiently - Its semantics, pros and cons - Dealing with backfills - Potential alternatives to the outbox pattern and the trade-offs they make
Gunnar Morling


Flink Jobs as Agents 🤖 – Unlocking Agentic AI with Stream Processing
Apache Flink is uniquely positioned to serve as the backbone for AI agents, enhancing them with stream processing as a new, powerful tool. We’ll explore how Flink jobs can be transformed into autonomous, goal-driven "Agents" that interact with data streams, trigger actions, and adapt in real time. We’ll showcase Flink jobs as AI agents through two key stream processing & AI use cases: 1) financial planning & detection of spending anomalies, as well as 2) forecasting demand & supply chain monitoring for disruptions. AI agents need business context. We’ll discuss embedding foundation models with schema registries and data catalogs for contextual intelligence while ensuring data governance and security. We’ll integrate Apache Kafka event streams with data lakes in open-table formats like Apache Iceberg, enabling AI agents to leverage real-time and historical data for consistency and reasoning. We’ll also cover latency optimization for time-sensitive use cases while preventing hallucinations. Finally, we’ll demonstrate an open-source conversational platform on Apache Kafka, where multiple AI agents are assigned to a business process, continuously process real-time events while optimizing for their individual goals, interacting, and negotiating with each other. By combining Flink and Kafka, we can build systems that are not just reactive but proactive and predictive, paving the way for next-generation agentic AI.
Steffen Hoellinger


Scaling Semantic Search: Apache Kafka Meets Vector Databases
This talk presents a performance-tuned Apache Kafka pipeline for generating embeddings on large-scale text data streams. To store embeddings, our implementation supports various vector databases, making it highly adaptable to many applications. Text embeddings are fundamental for semantic search and recommendation, representing text in high-dimensional vector spaces for efficient similarity search using approximate k-nearest neighbors (kNN). By storing these embeddings and providing semantic search results given a query, vector databases are central to retrieval-augmented generation systems. We present our Kafka pipeline for continuously embedding texts to enable semantic search on live data. We demonstrate its end-to-end implementation while addressing key technical challenges: - First, the pipeline performs text chunking to adhere to the maximum input sequence length of the embedding model. We use an optimized overlapping text chunking strategy to ensure that context is maintained across chunks. - Using HuggingFace’s Text Embeddings Inference (TEI) toolkit in a lightweight, containerized GPU environment, we achieve efficient, scalable text embedding computation. TEI supports a wide range of state-of-the-art embedding models. - As an alternative to relying on Kafka Streams, our solution implements optimized processing of small batches using Kafka consumer and producer client APIs, allowing batched API calls to TEI. Our benchmark results confirm this choice, indicating high efficiency with significantly improved throughput and reduced latency compared to other approaches. - Finally, Kafka Connect allows real-time ingestion into vector databases like Qdrant, Milvus, or Vespa, making embeddings instantly available for semantic search and recommendation. With Kafka’s high-throughput streaming, optimized interactions with GPU-accelerated TEI, and efficient vector serialization, our pipeline achieves scalable embedding computation and ingestion into vector databases.
Jakob Edding, Raphael Lachtner


Zero to Data Streaming Platform in Under 15 Minutes
Data streaming engineers need tooling to efficiently provision, maintain, and evolve the data stream platform. The Confluent Terraform Provider does just that, providing human-readable infrastructure-as-code to build a Confluent Cloud environment in a matter of minutes. In this session, we’ll start from a blank canvas and create a new environment - complete with an Apache KafkaⓇ cluster, stream governance, and processing with Flink. Next we’ll create Kafka topics, define data contracts and determine how to transform our input data. We won’t forget about security and access controls - so let’s create service accounts with the necessary roles and permissions. Finally, we’ll set it all in motion by streaming events into Kafka and querying the output of our new data pipeline. When we’re done, you’ll have the tools needed to build and maintain your data streaming platform. Let’s do this!
Sandon Jacobs