Current London 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


FlinkSQL Powered Asynchronous Data Processing in Pinterest’s Rule Engine Platform
Pinterest rule engine platform, also known as Guardian, allows Subject Matter Experts (SMEs) to analyze real time event streams for patterns of abuse and create rules to block those patterns. Guardian addresses various domain-specific challenges, including spam / fraud enforcement, Media Research Council (MRC), account takeover attacks (ATO), risk monitoring, and unsafe content enforcement fanout, etc. However, the legacy Guardian platform was built under a monolithic architecture and is unable to keep up with the data scale and the increasing demands and risks faced by stakeholders. To tackle these challenges, we redesigned next-gen Guardian with event-driven architecture by choosing FlinkSQL for scalable event processing and integrating with various data storage systems like Kafka, Starrocks, Iceberg and internal KVstore that cater to specific data access requirements. In this talk, we would like to share the design and learnings of building the new system. Specifically, we’ll focus on how FlinkSQL interacts with different storage systems and how FlinkSQL is leveraged to support asynchronous data processing needs, including stream splitting & pruning, data ingestion, rule enforcement and rewind & replay. Our revamped architecture has yielded significant improvements in scalability, efficiency, development velocity and data compliance. Additionally, we will touch base some ongoing efforts on safe schema evolution, which have become more challenging under the event-driven design with various storage systems and FlinkSQL introduced.
Sharon Xie, Heng Zhang


Modernizing Core Banking Platform: Real-Time Data Streaming from Mainframe to Salesforce
In this session, we will share our journey of modernizing a 40-year-old mainframe legacy system at BEC Financial Technologies, a financial tech provider of core banking platform for over 20 banks in Denmark. We will discuss how we leveraged Kafka to enable real-time data streaming from our mainframe to the new Salesforce platform, creating new business opportunities. Our presentation will cover the transition from traditional end-of-day batch processes to real-time data synchronization, highlighting the challenges and solutions we encountered. We will delve into the importance of DevOps in managing Kafka topics and the implementation of a Kappa architecture to handle both massive spikes and usual real-time data volumes. Key patterns such as event-carried state transfer, compacted topics, and change data capture (CDC) will be explored, along with our data reconciliation mechanisms to ensure consistency between DB2 and Kafka. We will also share lessons learned from our experience, including mistakes to avoid, such as relying on centralized components for data transformation and not using a schema registry. Additionally, we will discuss the benefits of using Kafka for both online events and batch jobs, and the considerations for deciding between bulk and REST in runtime. Furthermore, we will talk about all architecture design and some critical design decisions that were made during the implementation. This talk is ideal for architects, data engineers, and developers looking to modernize their legacy systems and integrate real-time data streaming into their platforms. Join us to learn how BEC Financial Technologies and its subsidiary Scoutz are transforming the banking industry with innovative data streaming solutions.
Wayne Yeung, Daniel Szymatowicz


Stream On: From Bottlenecks to Streamline with Kafka Streams Template
How do you make 10TB of data per hour accessible, scalable, and easy to integrate for multiple internal consumers? In this talk, we’ll share how we overcame storage throughput limitations by migrating to Kafka Streams and developing a unified template application. Our solution not only eliminated bottlenecks but also empowered internal clients to build reliable Kafka Streams applications in just a few clicks—focusing solely on business logic without worrying about infrastructure complexity. We’ll dive into our architecture, implementation strategies, and key optimizations, covering performance tuning, monitoring, and how our approach accelerates adoption across teams. Whether you're managing massive data pipelines or seeking to streamline access for diverse stakeholders, this session will provide practical insights into leveraging Kafka Streams for seamless, scalable data flow.
Yulia Antonovsky, Hadar Federovsky


For the brave ones: Dive into the Kafka wire protocol
Have you ever wondered what happens under the hood when your Kafka client talks to the broker? In this session, we’ll take a deep dive into the Kafka wire protocol - the low-level language that powers communication between Kafka components. We’ll break it down step by step to make it easy to understand. You’ll see how requests and responses are structured and get a clear picture of how everything fits together. To make it even more concrete, we’ll look at code examples that show how to build a Kafka request byte by byte. By the end of this session, you’ll have a solid grasp of the Kafka wire protocol, giving you the tools to create your own Kafka client - if you wish!
Thomas Iffland


Event-Driven AI: Real-Time Intelligence with Open Source Frameworks
Artificial Intelligence thrives on data—especially timely data. In this talk, we’ll explore how to integrate event-driven architectures with popular AI/ML frameworks to unlock real-time intelligence. We’ll dive into the nuts and bolts of constructing a continuous data pipeline using open-source technologies like Kafka Streams, Apache Flink, and popular AI libraries such as TensorFlow or PyTorch. We’ll walk through end-to-end examples: from data ingestion, cleaning, and feature extraction, to model inference in near-real time. You’ll discover how to optimize model performance under streaming conditions, employing sliding windows and advanced time-series techniques. Additionally, we’ll address operational challenges such as model updates in production, handling concept drift, and balancing compute resources with streaming throughput demands. Attendees will leave with a blueprint for setting up an event-driven AI pipeline, armed with concrete tips on choosing the right open-source frameworks, monitoring streaming model performance, and orchestrating seamless model deployments. If you’ve ever wondered how to blend AI with real-time event processing to deliver actionable insights the moment they matter, this session is for you.
Richmond Alake


Building a scalable and efficient real-time event counter framework with Flink(SQL) and Kafka
At Pinterest, counters are at the core of feature engineering, enabling teams to uncover event patterns and transform discoveries into actionable features. Our journey to build a robust counter framework surfaced several distinctive challenges: 1. The demand for a scalable architecture capable of managing hundreds of counters. 2. The ability to explore multiple window sizes from a minute to a week for the same counter with frequent updates to gain richer and faster insights. 3. The continual onboarding of new counters to stay ahead of emerging trends. In this session, we will delve into how we tackled these challenges by building a scalable and efficient real-time event counter framework with Apache Kafka, Apache Flink and a wide-column store. Our approach involves a two-stage data processing layer: - Stage 1: Flink jobs read event streams, apply filtering, enrich them with metadata outlining aggregation logic, and write intermediate records to Kafka. The stateless FlinkSQL queries dynamically generated from user-supplied SQL scripts ensures seamless addition and swift deployment of new counters. - Stage 2: A stateful Flink job consumes intermediate records, computes counter results and writes them to a wide-column store for online serving. To facilitate multiple window sizes with frequent updates, we leveraged a chain-of-window technique to efficiently cascade aggregated results from smaller to larger windows, therefore minimizing redundant computations and reducing data shuffling. We group counter results to emit multiple records in a single write. To avert write traffic surges as windows close, a custom rate limiter intelligently spreads out writes over time. These optimizations efficiently reduce write requests and avoid traffic spikes to the wide-column store, thus lowering costs and improving stability of the overall system. Attendees will gain insights into Flink’s SQL and windowing functionalities for scalable stream processing in real-world applications.
Hanyi Zhang, Heng Zhang