Current London 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


Processing Exception Handling and Dead Letter Queue in Kafka Streams
A major concern when starting with Kafka Streams is how to handle (un)expected errors. Generally, you want to track these errors, identify the records that caused the failures, and possibly reprocess them. To achieve this, you often need to implement a custom try-catch mechanism and send these errors to a dedicated topic. Does this challenge sound familiar? Welcome aboard! At Michelin, we face it too. For our own needs, we embedded this kind of error-handling mechanism in a home-made solution, but this solution has its limitations. Thus, we proposed two Kafka Improvement Proposals to enhance the Kafka Streams exception handling experience. KIP-1033 introduces a new processing exception handler, complementing existing deserialization and production exception handlers. Now, any exceptions that occur during processing are caught and transmitted to the handler, allowing you to define your error-handling logic. Complementary to this, KIP-1034 adds native support for routing failed records to a dead-letter queue topic of your choice. By the end of this talk, you will walk away with the latest updates these KIPs bring, helping you build more robust Kafka Streams applications against processing errors with less effort.
Loïc Greffier, Sébastien Viale


Exactly-Once vs. Idempotency: When Misconceptions Create Complexity
When building an event-driven architecture, teams often discuss exactly-once delivery and idempotency as if they were interchangeable concepts. This misunderstanding can lead to unnecessary complexity, increased operational overhead, and, in some cases, unreliable systems. In this talk, I will share a real-world case study from a project where our team fell into this trap. Initially, we assumed that enabling exactly-once semantics in Kafka would solve all our deduplication problems. However, as the system evolved, we realized that this approach didn’t eliminate the need for idempotency at the application level. The result? A complex, hard-to-debug system with redundant safeguards that sometimes worked against each other. Attendees will learn: The key differences between exactly-once delivery and idempotency. Why assuming one implies the other can introduce unnecessary complexity. How our team untangled this confusion and simplified our architecture. Practical guidelines for designing robust, event-driven systems without over-engineering them. This talk is ideal for engineers and architects working with Kafka and event-driven systems who want to avoid common pitfalls and build more maintainable, scalable architectures.
Oscar Caraballo, Luis García Castro


HALO Jumping into Flink: Lessons learned from managing real-time data at Daimler Truck
To offer its customers state-of-the-art digital services, Daimler Truck manages anonymized data from more than 12,000 connected buses operating in Europe using the CTP, an installed piece of technology that streams telemetry data (such as vehicle speed, GPS position, acceleration values, and braking force). The throughput going through the system is around 500k messages per second, on an average latency of around 5 seconds between the vehicle and when the data is available for consumption. Follow our three-year journey of developing self-managed, stateful Apache Flink applications on top of a treasure trove of near-real-time data, with the ultimate goal of delivering business-critical products like Driver Performance Analysis, Geofencing, EV Battery Health and Signal Visualization. Starting with a team completely new to Flink, we learned through trial, error, and iteration—eventually building a modern, resilient data processing setup. In this session, we'll share our victories, setbacks, and key lessons learned, focusing on practical tips for managing self-hosted Flink clusters. Topics will include working with Flink operators, understanding load distributions, scaling pipelines, and achieving operational reliability. We'll also delve into the mindset shifts required to succeed in building robust, real-time data systems. Whether you're new to Flink, transitioning from batch to streaming, or scaling existing pipelines, this talk offers actionable insights to help you architect, deploy, and optimize your self-managed Flink environment with confidence.
Fábio Silva, Carlos Santos


Blur the line between real-time and batch with Apache Kafka, Druid, and Iceberg
Ever since Apache Kafka spearheaded the real-time revolution, there has been a real-time vs batch divide in the data engineering community. The tools, architectures, and mindsets were so different that most people worked with one or the other and companies had to effectively maintain two data engineering teams to meet their data processing needs. But the rise of Apache Iceberg is bringing a dramatic shift in the data landscape. We have batch data powerhouses, like Snowflake and Databricks racing to adopt Iceberg support, followed by streaming tools like Apache Flink, and Confluent, arguably the leader in real-time data, adopting Iceberg with its Tableflow product. Now, real-time databases, like Apache Druid, are integrating Iceberg as well, so that we can query both our real-time and batch data with a single tool, often in a single query. I believe we really are seeing a revolution in data engineering. In this session, we’ll take a look at three key players in this data revolution, Kafka, Druid, and Iceberg. We’ll start with a brief introduction to each tool and then we’ll see some examples of architectures that allow us to get the most value from our data regardless of how old it is. Finally, we’ll talk about where this might be heading and how we, as data engineers, can thrive in this brave new world. It is my hope that you’ll leave this session with an understanding of some key tools, architectural patterns, and ways of looking at data that will equip you to, more efficiently, deliver the quality data your organization needs.
Dave Klein


Land of Confusion: Living with Hybrid Kafka Deployments for the Long Haul
Once thought to be a temporary state, more and more organizations are finding out that maintaining on-prem Kafka and a cloud deployment may last years or even forever. Confronting disparity deployments means dealing with the inherent differences between on-prem and cloud Kafka. Whether you are using a service provider or maintaining your own, there are important items to tackle for long term success. In this talk we will cover the most important strategies to ensure a successful hybrid deployment such as: * Entitlement: How to manage and unify AUTHN and AUTHZ * Data availability: Patterns for data migration and continual sync between on-prem and cloud * One onboarding to rule them all: Altering your existing control plane to accommodate hybrid * Monitoring: Creating a standard for your entire Kafka estate At the end of this talk you will understand the critical aspects that need to be addressed to cut through the confusion, and enjoy long term hybrid stability.
Anna McDonald


Unified CDC Ingestion and Processing with Apache Flink and Iceberg
Apache Iceberg is a robust foundation for large-scale data lakehouses, yet its incremental processing model lacks native support for CDC, making updates and deletes challenging. While many teams turn to Kafka and Flink for CDC processing, this comes with high infrastructure costs and operational complexity. We needed a cost-effective solution with minute-level latency that supports dozens of terabytes of CDC data processing per day. Since we were already using Flink for Iceberg ingestion, we set out to extend it for CDC processing as well. In this session, we’ll share how we tackled this challenge by writing change data streams as append tables and reading append tables as change streams. This approach makes Iceberg tables function like Kafka topics, with two added benefits: Iceberg tables remain directly queryable, making troubleshooting and application integration more approachable and streamlined. Similar to Kafka consumers, multiple engines can independently process Iceberg tables. However, unlike Kafka clusters, there is no need to scale infrastructure. We will also explore optimization opportunities with Iceberg and Flink, including when to materialize tables and how to choose between append and upsert modes to enhance integration. If you’re working on data processing over Iceberg, this session will provide practical, battle-tested strategies to overcome limitations and scale efficiently while keeping the infrastructure simple.
Mike Araujo, Sharon (Ran) Xie