Current London 2025

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

Exactly-Once vs. Idempotency: When Misconceptions Create Complexity

When building an event-driven architecture, teams often discuss exactly-once delivery and idempotency as if they were interchangeable concepts. This misunderstanding can lead to unnecessary complexity, increased operational overhead, and, in some cases, unreliable systems. In this talk, I will share a real-world case study from a project where our team fell into this trap. Initially, we assumed that enabling exactly-once semantics in Kafka would solve all our deduplication problems. However, as the system evolved, we realized that this approach didn’t eliminate the need for idempotency at the application level. The result? A complex, hard-to-debug system with redundant safeguards that sometimes worked against each other. Attendees will learn: The key differences between exactly-once delivery and idempotency. Why assuming one implies the other can introduce unnecessary complexity. How our team untangled this confusion and simplified our architecture. Practical guidelines for designing robust, event-driven systems without over-engineering them. This talk is ideal for engineers and architects working with Kafka and event-driven systems who want to avoid common pitfalls and build more maintainable, scalable architectures.

Presenters

Oscar Caraballo, Luis García Castro

Breakout Session

May 21

HALO Jumping into Flink: Lessons learned from managing real-time data at Daimler Truck

To offer its customers state-of-the-art digital services, Daimler Truck manages anonymized data from more than 12,000 connected buses operating in Europe using the CTP, an installed piece of technology that streams telemetry data (such as vehicle speed, GPS position, acceleration values, and braking force). The throughput going through the system is around 500k messages per second, on an average latency of around 5 seconds between the vehicle and when the data is available for consumption. Follow our three-year journey of developing self-managed, stateful Apache Flink applications on top of a treasure trove of near-real-time data, with the ultimate goal of delivering business-critical products like Driver Performance Analysis, Geofencing, EV Battery Health and Signal Visualization. Starting with a team completely new to Flink, we learned through trial, error, and iteration—eventually building a modern, resilient data processing setup. In this session, we'll share our victories, setbacks, and key lessons learned, focusing on practical tips for managing self-hosted Flink clusters. Topics will include working with Flink operators, understanding load distributions, scaling pipelines, and achieving operational reliability. We'll also delve into the mindset shifts required to succeed in building robust, real-time data systems. Whether you're new to Flink, transitioning from batch to streaming, or scaling existing pipelines, this talk offers actionable insights to help you architect, deploy, and optimize your self-managed Flink environment with confidence.

Presenters

Fábio Silva, Carlos Santos

Breakout Session

May 21

Land of Confusion: Living with Hybrid Kafka Deployments for the Long Haul

Once thought to be a temporary state, more and more organizations are finding out that maintaining on-prem Kafka and a cloud deployment may last years or even forever. Confronting disparity deployments means dealing with the inherent differences between on-prem and cloud Kafka. Whether you are using a service provider or maintaining your own, there are important items to tackle for long term success. In this talk we will cover the most important strategies to ensure a successful hybrid deployment such as: * Entitlement: How to manage and unify AUTHN and AUTHZ * Data availability: Patterns for data migration and continual sync between on-prem and cloud * One onboarding to rule them all: Altering your existing control plane to accommodate hybrid * Monitoring: Creating a standard for your entire Kafka estate At the end of this talk you will understand the critical aspects that need to be addressed to cut through the confusion, and enjoy long term hybrid stability.

Presenters

Anna McDonald

Breakout Session

May 21

Unified CDC Ingestion and Processing with Apache Flink and Iceberg

Apache Iceberg is a robust foundation for large-scale data lakehouses, yet its incremental processing model lacks native support for CDC, making updates and deletes challenging. While many teams turn to Kafka and Flink for CDC processing, this comes with high infrastructure costs and operational complexity. We needed a cost-effective solution with minute-level latency that supports dozens of terabytes of CDC data processing per day. Since we were already using Flink for Iceberg ingestion, we set out to extend it for CDC processing as well. In this session, we’ll share how we tackled this challenge by writing change data streams as append tables and reading append tables as change streams. This approach makes Iceberg tables function like Kafka topics, with two added benefits: Iceberg tables remain directly queryable, making troubleshooting and application integration more approachable and streamlined. Similar to Kafka consumers, multiple engines can independently process Iceberg tables. However, unlike Kafka clusters, there is no need to scale infrastructure. We will also explore optimization opportunities with Iceberg and Flink, including when to materialize tables and how to choose between append and upsert modes to enhance integration. If you’re working on data processing over Iceberg, this session will provide practical, battle-tested strategies to overcome limitations and scale efficiently while keeping the infrastructure simple.

Presenters

Mike Araujo, Sharon (Ran) Xie

Breakout Session

May 21

The race against time: Real-time Content Insights

The day start with one problem: How to have content from a CMS system to reflect into multiple systems, especially in our global search (knauf.com) “right away”! That’s easy if you can wait (welcome back to 1920s). Today, milliseconds can mean the difference between a happy customer (in our case, editor) and one lost to frustration. Why is that impressive? ~200 editors working and ~3.3 millions of connections a day!!! Here is where the stream helps us with (near) real-time data processing. Our system efficiently integrates Contentful CMS, Confluent Kafka, and Apache Flink to create a real-time data pipeline that captures, processes, and analyzes content updates with lightning-fast speed and precision.

Presenters

Eliel Lima Oliviera

Breakout Session

May 21

Stream Processing Smackdown: Kafka Streams vs. Apache Flink

Attention, Data Streaming Engineers! In a world where speed is everything, choosing the proper stream processing framework is crucial. Want to supercharge your apps with real-time data processing? Should you opt for the streamlined Kafka Streams, a lightweight library for building streaming applications, or the feature-rich Apache Flink, a powerful and flexible stream processing framework? Viktor Gamov, a principal developer advocate at Confluent with extensive experience in stream processing, will walk you through the nuts and bolts of these two leading technologies. Through live coding and practical examples, we'll cover: • Mastering State Management: Discover how each framework handles stateful computations and pick up optimization tips. • Fault Tolerance in Practice: See how Kafka Streams and Flink keep your applications running smoothly, even when things go wrong. • Scalability Showdown: Find out which tool scales better under heavy loads and complex tasks. • Integration Insights: Learn how to seamlessly fit these frameworks into your existing setup to boost productivity. We'll explore scenarios showcasing each option’s strengths and weaknesses, giving you the tools to choose the best fit for your next project. Whether you're into microservices, event-driven systems, or big data streaming, this talk is packed with practical knowledge that you can immediately apply to your projects, improving performance and efficiency.

Presenters

Viktor Gamov

Breakout Session

May 21