Current New London 2026

Session Archive

Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.

Please refresh the page
Oops! Something went wrong while submitting the form.
Building Reliable CDC and Kafka Mirroring Pipelines at Trillion-Message Scale
At large scale, reliability is unforgiving. When a data platform processes trillions of events per day, even small delays or inconsistencies can ripple across analytics, AI systems, and customer-facing products. In these environments, Change Data Capture (CDC) pipelines are no longer just ingestion tools — they become core production infrastructure with strict latency and correctness requirements. In this talk, we’ll share lessons from operating Brooklin, an open-source data streaming platform used at LinkedIn to run reliable CDC and Kafka mirroring pipelines at massive scale. Brooklin processes over 7 trillion messages per day across 50+ clusters, mirrors 100k+ Kafka topics, and supports sub-minute SLAs for critical workloads spanning multiple teams and use cases. Rather than focusing on how to build CDC systems from scratch, this session emphasizes how platform teams can adopt proven patterns to operate CDC and Kafka mirroring reliably in real-world environments. We’ll discuss common CDC use cases across database-heavy organizations, including capturing changes from systems such as MySQL, Oracle, and TiDB, streaming them into Apache Kafka, and mirroring data across Kafka clusters for isolation, multi-region deployments, and organizational boundaries. This session is aimed at intermediate to advanced data engineers and platform teams. Rather than diving into low-level internals or how to build CDC from scratch, we’ll focus on practical design and operational strategies for adopting and operating CDC and Kafka mirroring platforms at scale: partitioning and throughput considerations, handling schema evolution, managing backpressure, and supporting differentiated SLAs—from near-real-time (≈1 minute) to relaxed latency (≈30 minutes)—across shared infrastructure. Brooklin, an open-source data streaming platform, will be presented as a reference implementation that demonstrates how these patterns work in practice. We’ll share how similar approaches can be adopted by other organizations building CDC and Kafka mirroring pipelines across diverse databases and environments. Attendees will leave with concrete insights into designing reliable CDC architectures, understanding real-world failure modes, and applying proven patterns to build production-grade streaming systems.

Harshade Yesane

LinkedIn

This is some text inside of a div block.
,
This is some text inside of a div block.
From Batch to Real Time: Operating Cassandra CDC with Debezium at Datadog Scale
At Datadog, the Metrics Query Activity feature relies on fast faceted search across operational data stored in Cassandra. The previous replication model used scheduled batch jobs that queried Cassandra by partition key and copied the data into Elasticsearch. This created heavy read pressure on production clusters, introduced operational complexity, and resulted in a four hour delay before changes became visible downstream. The batch jobs ran on a fixed schedule and were enabled for only a limited subset of customers. With the Cassandra cluster sustaining write volumes exceeding 30,000 writes per second, extending this approach to the full customer base would have required an increase in job execution rate and query volume. This talk presents how we replaced this batch approach with a real time streaming architecture based on Cassandra CDC and the open source Debezium Cassandra connector, including upstream contributions to the project. By capturing commit logs directly and streaming changes into Kafka, we removed the need for read intensive extraction jobs. A downstream Kafka Connect Elasticsearch sink then applies updates as they arrive, keeping indexed documents aligned with the source of truth within seconds. Supporting Datadog’s write volume required ensuring the CDC pipeline could process more than 30,000 writes per second with resilient behavior. We tuned Debezium’s Kafka producers and evaluated the system under peak load, while verifying at least once delivery and clean recovery from connector issues to maintain eventual consistency downstream. The impact is reflected in several key metrics. Replication delay fell from four hours to under ten seconds. Eliminating read heavy extraction jobs removed pressure on Cassandra and created opportunities for future cluster downscaling. The new architecture also reduced operational cost by an estimated 46 percent while providing a streaming model that scales naturally with write throughput and isolates OLTP workloads from downstream processing. Attendees will learn how to implement Cassandra CDC with Debezium in a high volume environment, how to tune and scale Debezium and Kafka to handle demanding write workloads, how to migrate safely from batch replication to streaming, and the practical lessons we learned while operationalizing Cassandra CDC at Datadog scale.

Joan Gomez

Datadog

Alejandro Huertas

Datadog

This is some text inside of a div block.
,
This is some text inside of a div block.
Testing Flink SQL Scripts Made Simple for Non-Developers
Developing Flink SQL scripts can be challenging, especially for non-developers like data scientists. Simplifying the development process is essential to ensure correctness and reliability, with testing playing a crucial role. We introduce a Flink SQL Test Runner as a solution to streamline this process. Key Points:1. Test Runner Architecture: Get a look at the inner workings of the SQL Test Runner which supports both unit and integration tests and generates detailed reports. It is designed to be used behind REST APIs and CI/CD Pipelines. 2. Unit Tests: Learn hands-on how to write a unit test using a SQL script and a Java unit test. In the background, Apache Paimon is used to mock sources and sinks. The unit test files are compiled at runtime, enabling quick execution over the network. 3. Integration Tests: Get to know the concepts used behind a SQL integration test using both the user’s SQL script and a testing SQL script - integrating the ideas of DB to test SQL with SQL. The Test Runner supports both negative and positive testing modes to assert the count of the retrieved results of a stream. 4. Deployment: The Test Runner is packaged as a Docker image, accepting file paths via environment variables, and facilitating integration into CI/CD pipelines behind REST APIs. Conclusion:The Flink SQL Test Runner significantly enhances the testing process for Flink SQL scripts, supporting robust unit and integration tests. It simplifies the development process for complex Flink projects, meeting the needs of both developers and non-developers.

Robin Fehr

Acosom GmbH

This is some text inside of a div block.
,
This is some text inside of a div block.
Deep dive into writing Queues for Kafka applications
Queues for Kafka introduces a new paradigm for consuming data from Apache Kafka, along with the new Share Consumer API. If you’ve ever struggled with building message-queuing applications on top of Kafka, struggle no more. The Share Consumer API in Apache Kafka 4.2 makes this easy. The Share Consumer API still consumes data from Kafka topics, but if you think about how it works, it behaves much more like a message queue. Learn all about how partitions are shared by the consumers, taking precise control over record fetching, the ways of acknowledging record delivery properly, error handling strategies, and even how best to deal with records which take a very long time to process. If you’ve wondered how to read records from Kafka and process them with a group of workers using generative AI without worrying about head-of-line blocking, this is the talk for you.

Andrew Schofield

Confluent

Apoorv Mittal

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Back to the Boring: GenAI That Ships
Everyone’s chasing the Unicorn: agentic workflows, autonomous everything, “AI-first” transformations, and flashy demos. Meanwhile, the reality reported by MIT is clear: about 95% of GenAI pilots fail to deliver. But the issue isn’t with the models, it’s with the problems we choose to address.This talk focuses on the 5% that succeed. Not by taking on larger, shinier projects, but by targeting the boring work that exists in every enterprise: repeatable workflows, exceptions, approvals, reconciliations, handoffs, and “someone needs to write a report and decide what to do next.” These are the places where small improvements compound quickly, and where GenAI can create real productivity gains now.I’ll show a delivery pattern where GenAI is embedded into a set of small, focused microservices, coordinated through Kafka as the system-of-record for workflow state and decisions. Kafka provides a shared backbone for service-to-service communication, with schemas and contracts enforcing structured data, and with traceability treated as a first-class design goal. That traceability is what turns GenAI from an unpredictable demo into a repeatable system: every step is observable, decisions can be audited, outputs can be replayed, and you avoid the trap of an LLM taking unrepeatable actions that you can’t explain later.No hype. No digital revolution required. Just a grounded playbook for turning GenAI from a pilot factory into a value delivery engine, because there’s a long backlog of boring problems waiting to be solved.You’ll leave this talk with:- Real examples of “boring” GenAI use cases already running in production- A concrete way to structure GenAI systems so they stay understandable, auditable, and under control as they evolve- Practical techniques for making GenAI behaviour traceable and repeatable, instead of opaque and one-off- A stronger instinct for saying “no” to bad AI ideas, and focusing effort where GenAI can deliver value now By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

David Navalho

Marionete

This is some text inside of a div block.
,
This is some text inside of a div block.
Towards Interoperable Intelligence: Streaming Foundations for Multi‑Agent Systems
As enterprises begin deploying AI agents at scale, agent sprawl is inevitable. Without governance infrastructure, unmanaged autonomy cascades: agents cannot reliably discover existing tools, so teams unknowingly rebuild functionality; data quality drifts between systems; conflicting decisions emerge; lineage goes missing; and hallucinated actions execute at machine speed, before any human can audit or stop them. A governance nightmare. In this talk, we show how a self‑service event streaming platform provides the missing foundation to govern multi‑agent systems. Event stream processing delivers real‑time aggregation, explicit data contracts, and end‑to‑end observability - so agents can reason in a clean, auditable state instead of ad‑hoc APIs and shadow IT. Our position: taming this sprawl does not require a new AI stack. It requires treating your existing event streaming platform as governance infrastructure. At E.ON / Essent, we're building a self‑service streaming platform on top of Confluent Cloud for enterprise AI operations. Key capabilities – EventCatalog for discovery, Flink for real‑time aggregation, and validation gates – are proving essential for agent governance. The hardest part, however, is not platform engineering; it is helping teams understand when and how to use it, and to think in streams, not just events. Our strategic framework - the Agentic AI Interoperability Target Picture - addresses the autonomous systems challenge: the identity and policy layer enforces safe execution boundaries, the control plane coordinates agent activity, and the agent gateway validates all requests. This enables bounded autonomy at scale. The streaming platform forms that foundation, enabling agents to reason on curated, validated, aggregated state instead of stale snapshots. Three concrete capabilities operationalize this: Discovery & Registry: Searchable data contracts enable agents to know what exists Real‑Time Aggregation: Flink materialized views provide timely state Upstream Validation: Quality gates enforce schemas so agents act only on trusted data By combining self‑service access with centralized governance, we've found a path from agent sprawl toward orchestrated autonomy at scale. Attendees will not only learn our platform patterns and governance approach, but discover the real challenge: organizational transformation. E.ON's SAP case - 500+ topics, real-time aggregation replacing blind batches - proves shifting teams to stream thinking is what unlocks AI autonomy at scale.

Patrick Berger

E.ON Digital Technology

Martijn van der Pauw

Essent

This is some text inside of a div block.
,
This is some text inside of a div block.
🤖 Building AI systems? Context - and Flink - is all you need!
Traditional batch architectures cannot meet the needs of modern AI systems, which increasingly operate as autonomous agents requiring millisecond-latency access to both data and its metadata context. Batch ETL introduces unavoidable staleness, relies on fragile orchestration for backfills, and pushes governance and lineage downstream into the analytical estate — too late for AI systems that must make real-time operational decisions. This creates accuracy issues, model drift, and regulatory blind spots. This talk explains why organizations need to adapt streaming-native Real-time Context Engines built on continuous data processing, incremental enrichment, and first-class data governance. Using engines like Apache Flink, enriched event streams are governed, and lineage-tracked as part of the streaming pipeline itself, shifting left toward the point of data generation. We detail how event-time semantics, schema evolution, and incremental state updates enable deterministic behavior and full reproducibility without manual pipeline rewrites. A core capability is materialized context: every enriched and governed dataset is projected into a strongly consistent, queryable in-memory table, continuously updated from the event log. Both the data and its metadata context are available to AI agents through open interfaces like MCP (Model Context Protocol) or REST API endpoints. This enables agents not only to retrieve the freshest state, but also to inspect the provenance, quality constraints, and governance rules associated with that state which is critical for reliability, trust, and regulatory compliance. Equally important is the role of memory management, indexing, and schema intelligence in serving the optimal context to stateless AI models. Because AI systems have no internal memory, every prompt requires reconstructing the most relevant slice of context: per case, customer, conversation, transaction, or semantic topic. This demand necessitates granular in-memory indexing, adaptive caching, and strong ontology awareness to locate and deliver only the minimal but most meaningful context at low latency. Organizations must therefore deeply understand their data ontologies, entity relationships, and schema evolution patterns to design memory-efficient, fine-grained indexes that ensure AI agents always operate with precise, fully updated context rather than broad, stale datasets. We will show architectural blueprints and operational patterns for building scalable, low-latency, governance-first context layers suitable for high-stakes AI-driven operations. In that context, we will highlight the regulatory implications: when AI systems make recommendations or act automatically, organizations must document the context and lineage of the data that influenced the decision. Real-time lineage tracking ensures auditability, verifiable traceability, and accountability.

Steffan Hoellinger

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
From Data Pipelines to Context Streams: Building Infrastructure for the Agent Era
For decades, data engineers have built infrastructure optimized for human consumers—analysts running queries, scientists training models, executives viewing dashboards. Batch processing, overnight refreshes, and query-friendly schemas made perfect sense. But in 2026, a new primary consumer is emerging: AI agents. And they have radically different requirements. Agents don't wait for nightly batch jobs. They need fresh context delivered in milliseconds. They don't browse dashboards—they consume structured context windows that must be assembled on-demand from multiple streaming sources. They don't tolerate stale data gracefully; outdated context leads to hallucinations, incorrect actions, and compounding failures across multi-agent workflows. This talk introduces "context engineering" as the discipline of building data infrastructure for agent-facing applications. We'll explore how streaming platforms like Apache Kafka become the foundational layer for real-time context delivery, and why the patterns that served human analytics fall apart when agents are your consumers. We'll cover three core challenges through production examples: First, context assembly—how to join, filter, and enrich multiple event streams into coherent context windows with sub-100ms latency using Kafka Streams and Flink. Second, state management for agents—leveraging event sourcing patterns so agents can access not just current state but temporal context ("what did this customer do in the last hour?"). Third, observability for agent-consumed data—why traditional data quality metrics miss the failure modes that matter for agents, and how to build context delivery SLOs. Throughout, we'll examine real architecture decisions: when to push context to agents versus let them pull, how to handle context window limits as a backpressure signal, and patterns for graceful degradation when upstream data sources lag. The underlying principles of good data engineering—reliability, freshness, correctness—remain constant. But the application layer is transforming. Data teams that recognize agents as first-class consumers, not afterthoughts, will build the infrastructure that powers the next generation of AI applications. Attendees will leave with concrete architectural patterns for agent-facing data infrastructure, an understanding of how streaming primitives map to agent context requirements, and a framework for evaluating whether their current data platforms are ready for AI-native workloads. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

David Kjerrumgaard

StreamNative

This is some text inside of a div block.
,
This is some text inside of a div block.
Kafka Head-of-Line Blocking: Increase Throughput, Reduce Latency
Kafka consumers have a throughput problem many might not know about: head-of-line blocking. When one message takes longer to process - due to a slow database call or external API latency - every message behind it waits. Even messages for entirely unrelated customers, orders, or entities sit idle while the slow one completes. This silently degrades system performance and business responsiveness. The standard advice to add more partitions trades one problem for another: partition management and operational complexity. Rebalancing storms and coordination overheads begin to dominate, and at some point you end up managing partitions instead of customer features. This talk examines head-of-line blocking from first principles. We will quantify the impact - real numbers showing how a single slow message even on healthy systems can reduce effective throughput dramatically. We will explore why the problem is architectural, not configurational, and why tuning settings can only take you so far. We will look at what a real solution requires while preserving Kafka's ordering guarantees, because ordering is one of the reasons we chose Kafka in the first place. We will walk through the architectural patterns involved - examining trade-offs - and see how extensive chaos testing validates the approach actually works under production conditions including rolling deployments, consumer OOMs, and network partitions. Finally, we'll do a live demonstration showing the difference in practice: the same workload and message volume with dramatically different throughput and latency. Attendees will see the problem and the solution side by side and will leave understanding why head-of-line blocking matters, architectural patterns for solving it, and a working implementation they can adopt immediately.

David Green

This is some text inside of a div block.
,
This is some text inside of a div block.
Stop Answering Today's Questions with Yesterday's Data: Low-Latency RAG with Kafka and Flink
Your shiny, new, cutting-edge RAG microservice is only as smart as its context. And if that context is refreshed by a slow, batch-driven job, your AI is essentially answering today’s critical questions by consulting yesterday’s equivalent of a stale newspaper. It’s time to transition your RAG architecture from batch dependence to streaming certainty. Let’s discuss a “streams-first” approach to building data pipelines with fresh context. We’re using Apache Kafka and Apache Flink to build the always-on knowledge backbone your RAG microservices deserve. We’ll focus on the foundational engineering practices that guarantee reliability and access to real-time data: * Kafka as the data substrate: Data streams based on a fault-tolerant, high-throughput source of truth to capture every critical change across your organization.* Flink’s Real-Time Prep: Leveraging Flink for stateless transformation, stateful contextual enrichment and streamlined chunking—performing the heavy lifting as data arrives.* Production-Grade Guardrails: Implementing crucial patterns like Exactly-Once Semantics (EOS) for data consistency and establishing a Dead Letter Queue (DLQ) strategy for reliable error handling. Join this session for a discussion of the core data principles needed to build truly resilient RAG microservices where the knowledge base is always measured in seconds, not days.

Sandon Jacobs

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
What If We've Been Scaling Stream Processing Wrong All Along?
Your Kafka Streams application just rebalanced. Again. Your Flink checkpoint is timing out. Again. Here's an uncomfortable truth: most stream processing applications don't operate at Uber scale. They handle thousands of events per second—complex joins, stateful aggregations, valid use cases—but nowhere near the volumes that justify the operational complexity we've accepted as normal. Yet we pay the full distributed systems tax anyway. Repartition topics doubling network I/O. Repeated serialization burning CPU cycles. Standby replicas sitting idle. State migration or restoration during deployments. And the human cost: specialized expertise that takes years to develop, expert teams that are expensive to build and painful to lose. We've normalized extraordinary inefficiency in the name of horizontal scalability that many applications will never need. But rethinking stream processing in 2026 doesn't mean "just use Postgres." In this talk, I'll share an early-stage exploration of a different approach. A framework that preserves the Kafka Streams DSL, borrows Flink's approach to exactly-once semantics, leverages Project Loom for high concurrency—and challenges a fundamental assumption that both frameworks share. This isn't a production-ready announcement. It's an invitation to question conventional wisdom and explore what stream processing could look like when we stop distributing by default.

Harmut Armbruster

LittleHorse

This is some text inside of a div block.
,
This is some text inside of a div block.
Beyond Watermarks: Custom Flink Operators for Feature-Trigger Synchronization
To catch malicious behavior on Europe's largest second-hand marketplace, every second matters - but so does accuracy, which relies on data completeness. At Vinted, within the Trust domain, we process millions of events daily to enable ML models and rule engines to detect malicious users and policy-violating content in real-time. Using an event-driven architecture, we rely on events to trigger rule evaluations. These also require other computed features in a timely manner for accurate detection results. If these lag behind, models operate on stale or even missing data. In this talk, I will first elaborate on why built-in event-time processing operators in Flink didn’t meet the requirements to guarantee feature-trigger alignment. For starters, Flink only has (temporal) inner joins out of the box. Then, I will share the custom solutions we built. In particular, you will learn two distinct patterns, suitable for different data velocities and scenarios. Pattern 1: Feature Store and Ingestion Notification Alignment We have several complex features computed by dedicated pipelines, writing to our Feature Store. Here, we rely on ingestion notifications to signal feature availability. We buffer triggers until all required feature groups report readiness or timeouts are reached. The downstream service then evaluates rules once the trigger hits, and we know that the necessary features are available. Pattern 2: Direct Trigger Enrichment When data directly relates to a trigger event - or latency is critical - it makes sense to attach it directly to triggers. The exact implementation is still split into two cases depending on data velocity; we can have slowly-changing dimensions requiring longer buffering, but also quickly updating streams needing short alignment windows. Key Takeaways After this talk, you’ll know when feature-trigger alignment is needed in streaming, the tradeoffs between aligning triggers with Feature Store ingestions or using direct enrichments, and the implementation details of our custom Flink operators. I’ll also share insights on how deployment strategies and certain outages affect enrichment correctness.

Csanád Bakos

Vinted

This is some text inside of a div block.
,
This is some text inside of a div block.
Debezium, Apache Kafka, and an Acyclic Synchronization Algorithm
How do you migrate millions of user accounts from a live legacy platform to a new one with zero downtime, while both systems remain readable, writable, and in sync? In a previous role at a major classifieds platform in Germany, I worked on a user-migration system based on Apache Kafka and Debezium that kept two live production systems synchronised in real time during a multi-month platform migration. Users could update their data on either platform while records gradually progressed through a dedicated migration state machine. In this talk, I’ll walk through the architecture and design decisions behind the migration system: why we chose a Debezium-based change data capture (CDC) approach over a hand-rolled transactional outbox, how it was integrated with an existing legacy MySQL cluster, and how forward sync and backsync between the legacy and new systems were designed. I’ll then dive into the acyclic synchronisation algorithm that uses the legacy system’s optimistic-locking version column as a logical clock to break infinite update loops while still allowing controlled updates on both sides. Participants can expect to learn practical architectural patterns for large-scale data migrations: modelling migration states, running Debezium in production, and using logical clocks to keep two production systems aligned in real time with Apache Kafka. This session targets intermediate to advanced engineers and architects familiar with Apache Kafka fundamentals.

MD Sayem Ahmed

eBay

This is some text inside of a div block.
,
This is some text inside of a div block.
Transactional Change Stream Processing With Apache Flink
Apache Flink is commonly used for processing Debezium change data events: for running continuous queries enabling real-time analytics as the data in your OLTP store changes, for filtering and transforming change data feeds, or for creating denormalized data views sourced from the change data events of multiple tables. While powerful, this processing happens message by message, resulting in the emission of partial results to downstream consumers while the change events originating from a single transaction in the source database are processed. Oftentimes, that’s not desired: instead, results should only be emitted once all the events from a transaction have been received. In this talk, we’ll explore how this problem can be solved by leveraging Debezium’s transaction metadata. It describes how many events of which type belong to a given transaction in a database like Postgres or MySQL. We’ll show how to take advantage of this information for implementing an innovative watermarking approach which, together with a custom output buffer, ensures that event consumers will only ever receive transactionally consistent data.

Gunnar Morling

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Thinking in Streams: Building Stateful, Serverless Agentic Loops
This session addresses the evolution of AI from simple retrieval to autonomous "Agency" by introducing the Streaming Agentic Loop, an architectural blueprint designed to eliminate the latency inherent in static databases. Targeted at architects and engineers, the talk explores the convergence of event-driven systems and AI through a technical deep dive into using Apache Flink (PyFlink) for stateful memory management, Kafka for feedback loops, and Google CloudRun for serverless, non-blocking LLM inference. By tackling production challenges like token cost optimization via semantic caching and handling "ghost actions" with idempotent execution, the presentation provides attendees with a proven reference architecture and actionable code patterns for building resilient, real-time agents capable of complex tasks like fraud mitigation and supply chain automation.

Shuva Jyoti Kar

Cisco Systems (India) Pvt Ltd

This is some text inside of a div block.
,
This is some text inside of a div block.
Python Streaming Analytics Leveraging the Composable Data Stack
Modern streaming workloads do not need heavyweight stream processors to power analytics and ML on Kafka data. This Breakout Session presents a composable architecture where Kafka is the event backbone. Rust, Apache Arrow, and Python-based analytics libraries form a focused, production-ready data stack. The core theme is simple. Orchestrate vectorized operations and let the right tool own the right concern. The talk targets an audience comfortable with data engineering and analytics platforms. It explains how IO and serialization are handled in Rust. CPU-bound operations such as aggregations, joins, and feature engineering run in Arrow-native libraries like Polars or Pandas. Data moves as zero-copy Arrow tables between Rust and Python. This design removes repeated serialization and avoids ad hoc in-memory formats. The material is highly relevant for teams building real-time analytics and ML features on Kafka. These teams want low-latency, high-throughput pipelines without a monolithic stream processing framework. Attendees will see how a columnar, vectorized execution model on top of Kafka can still feel familiar to data analysts. It keeps the workflow close to “just analytics on tables,” even in a continuous streaming environment. Audience takeaways include several concrete patterns. They will learn how to organize responsibilities between Kafka, Rust services, and Python analytics. They will see how zero-copy Arrow interchange simplifies cross-language pipelines. They will also learn how to keep the mental model simple while still meeting production performance and reliability requirements.

Arthur Andres

Tradewell Technologies Inc

This is some text inside of a div block.
,
This is some text inside of a div block.
Embedding Tiny Language Models in Flink SQL functions
Learn how to embed a tiny language model inside your Flink SQL pipeline – turn messy and free-form text from your events into structured and actionable fields in real time, all within your cluster. You’ve likely seen AI in event streams being done by calling hosted cloud large language models over HTTP. That is the right option for many scenarios, but isn’t practical for every use case. Maybe your data needs to stay in-cluster, your Flink job is running somewhere without public cloud access, or you just need a predictable per-event cost. For situations like these, you could co-locate a tiny language model in your Flink job. In this session, I’ll go through some use cases where this approach is most useful (and make it clear where it isn’t sensible!) I’ll walk through practical steps for how to make open source and freely available models accessible from Flink SQL as custom functions – including how to choose a CPU-friendly model, considerations for prompts that are effective with tiny models, and the observability needed to ensure your solution is viable for the event stream throughput.

Dale Lane

IBM

This is some text inside of a div block.
,
This is some text inside of a div block.
Joining Streams Through Time: As-Of Joins with Spark 4's new transformWithState API
Spark Structured Streaming offers powerful built-in operators for stream-stream joins, but what happens when you need to join two streams based on temporal proximity rather than exact key matches? As-of temporal joins—where each event from one stream matches the most recent corresponding record from another—are essential for scenarios such as currency conversion at transaction time, enriching trades with the latest quotes, or correlating sensor readings with their nearest calibration values. Apache Spark 4.0 introduces transformWithState, a next-generation stateful processing operator that replaces the limitations of flatMapGroupsWithState. This talk demonstrates how to leverage transformWithState to build production-grade as-of temporal joins between two event streams. We'll explore the core building blocks: using MapState to maintain versioned lookup data keyed by time, processing incoming events to find the nearest temporal match, and managing state lifecycle with TTL-based eviction to prevent unbounded growth. You'll see how the new object-oriented StatefulProcessor model separates concerns cleanly—handling input events in handleInputRows and expired state in handleExpiredTimer—making complex temporal logic more maintainable than ever before. Through live code examples in both Scala and Python (transformWithStateInPandas), attendees will learn practical patterns for buffering late-arriving reference data, handling out-of-order events using watermarks and timers, and emitting joined results only when temporal alignment is confident. We'll also demonstrate how to use Spark 4's new state data source reader to debug and monitor the internal state of your temporal join during development. Key takeaways: When and why to implement custom temporal joins instead of using built-in stream-stream joinsStep-by-step implementation of as-of joins using transformWithState composite state typesState management strategies, including TTL, timers, and watermark integrationDebugging techniques using the state data source readerWhether you're building financial trading systems, IoT analytics pipelines, or real-time ML feature stores, this talk equips you with patterns to solve temporal alignment challenges in Spark Structured Streaming.

Carlos Rodrigues

Databricks

This is some text inside of a div block.
,
This is some text inside of a div block.
Migrating a Large-Scale Kafka Streams Platform to the KIP-1071 Rebalance Protocol
For half a dozen years, we’ve been running Kafka Streams applications at bakdata, constantly balancing throughput, stability, and operational overhead. A recurring pain has been frequent or slow rebalances that negatively impact the latency and throughput of Streams apps. While rebalancing is necessary to optimally distribute tasks - such as when members join or leave a group - frequent rebalances can be disruptive. Incremental cooperative rebalancing (Kafka 2.4+) reduces partition movement and even keeps processing running during rebalancing (Kafka 2.5) but does not fully alleviate the pain.With growing experience, we optimized configs of our Streams apps for specific app characteristics to avoid frequent and slow rebalances that stall processing. Such characteristics include high-latency record processing, application statefulness, or dynamic horizontal scaling.The new Streams rebalance protocol introduced in KIP-1071 makes assignment of Streams tasks a first-class citizen in the Kafka protocol. It promises to make rebalances less disruptive to the processing.As part of our migration efforts to the new protocol, we adjusted our established Streams configs where necessary. With our insights, we hope to help others migrate to the new Streams rebalance protocol once it becomes generally available in an upcoming Kafka release.For the classic rebalance protocol, we used to tune configs on a per-app basis to prevent undesirable rebalances. Depending on the Streams app’s characteristics, we typically made some of the following adjustments: Increase group.initial.rebalance.delay.ms on the broker to give members more time to join the consumer group during “scale-from-0” of a Streams app. Increase max.poll.interval.ms and decrease max.poll.records and/or max.partition.fetch.bytes to account for high-latency processing of individual records. Set group.instance.id - not just for stateful apps - to mitigate the impact of member restarts. Increase transaction.timeout.ms in exactly-once scenarios. In our session, we aim to answer what the new Streams rebalance protocol means for a mature Kafka Streams platform. Using the new protocol, what Streams app configs need tweaking? Do some configs’ defaults suffice now where we previously needed to tailor configs to an app’s characteristics? What changes for the observability of Streams apps? How does rebalancing behavior differ in terms of frequency and rebalance duration? What impact does this have on latency and throughput of real-world Streams apps? Join us to ready yourself for the new Streams rebalance protocol!

Jakob Edding

bakdata

This is some text inside of a div block.
,
This is some text inside of a div block.
Scaling Real-Time AI Actions with Amazon Bedrock AgentCore and Confluent Streaming Agents
Most teams can get an AI agent to answer questions, far fewer can trust agents to take real actions in production. In this session, AWS and Confluent show how to combine Amazon Bedrock AgentCore with Confluent Streaming Agents on Confluent Cloud to build event-driven agents that observe live events, reason with real-time context, and act safely at scale on AWS. We’ll walk through reference architectures where Confluent’s data streaming platform continuously assembles governed, low-latency context from Kafka and Flink into Bedrock, while AgentCore Policy, Evaluations, and Memory enforce guardrails and quality over long-running workflows. Attendees will see demo patterns for use cases like fraud detection, customer operations, and DevOps automation, showing how to move from isolated AI demos to production-grade agents that drive measurable outcomes on AWS. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Arun Nallathambi

AWS

This is some text inside of a div block.
,
This is some text inside of a div block.
AI Needs Context. Why Flink is Made for Context Engineering
We have mastered Prompt Engineering, but production AI needs Context Engineering: the architectural discipline of curating the information an LLM sees before it answers. The right context window transforms a generic chatbot into a specialized expert. A perfect prompt is useless if the model is fed stale data or overwhelmed by irrelevant noise. This not only confuses the model but also increases the cost of every query. To build reliable AI Agents, we need a data processing layer that doesn't just move data but actively "engineers" it into a state-ready format for inference. This is where Apache Flink’s new ProcessTableFunctions (PTFs) change the game. PTFs leverage Flink’s full capabilities, allowing for custom, stateful processing logic within a structured framework. In this session, we will explore why this kind of stream processing is the natural backbone for Context Engineering and how Flink’s PTFs provide the missing primitives for dynamic context construction. We will demonstrate how to: Orchestrate context from multiple sources into a single, unified view. Dynamically prune and format conversation history into a highly compressed state. Create time-based context windows that are ranked and pre-aggregated in real-time. Engineer a system that "remembers" users over weeks or months, so conversations pick up exactly where they left off.

Timo Walther

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Supersonic Streams: When Quarkus Met Kafka
Building modern, event-driven applications with traditional Java and Apache Kafka often means grappling with slow development cycles and complex local environment setup. This friction hinders developer productivity. A new generation of Java platforms like Quarkus is changing this. Designed for the cloud, these runtimes feature extremely fast startup times and a minimal memory footprint. We'll demonstrate how these platforms drastically simplify the developer experience—such as by automatically provisioning a live Kafka cluster for local development. Beyond the inner-loop, we will show how these optimized applications, combined with cloud-native tooling, enable you to deploy and rapidly scale your Java microservices and Kafka components on platforms like Kubernetes. This ensures you can efficiently handle fluctuating event loads, scaling seamlessly from zero to massive scale. Attendees should come away with: Learn how modern Java simplifies the entire Kafka development lifecycle, from local setup to production.Practical knowledge for building high-performance, event-driven applications designed for rapid scaling and minimal resource consumption.Inspiration for leveraging these technologies to achieve faster time-to-market and operational efficiency.

Viktor Gamov

Confluent

Kevin Dubois

IBM

This is some text inside of a div block.
,
This is some text inside of a div block.
How to write your own partition assignor in Kafka’s KIP-848 Era
Apache Kafka 4.0 makes the new consumer group rebalance protocol (KIP‑848) generally available, shifting assignment computation from clients to the broker-side coordinator and eliminating the classic leader-driven, stop-the-world rebalances. This change simplifies consumers and improves stability and time-to-recover during membership or metadata churn, but it also removes support for client-side partition assignors on standard consumers in favor of pluggable server-side assignors. For teams that previously relied on custom assignment strategies, the good news is that you can still customize behavior by implementing a broker-side assignor via the new server assignor SPI and enabling it based on group-level and client configurations. This talk demystifies how assignment works in the KIP‑848 era and provides a practical, end-to-end guide to writing, testing, and safely rolling out a custom broker-side assignor. We’ll cover the coordinator’s target-assignment model and incremental reconciliation, as well as the constraints and contracts your assignor must respect (stickiness, determinism, payload limits). We’ll share a working example: real code and configs to write and use our very own custom assignor. We’ll also share migration tips for teams moving off classic client-side strategies, how to observe and debug assignments with the new metrics, and guardrails for compatibility and rollback. You’ll leave with a clear checklist and reference implementation path to bring your own assignor to production without sacrificing the resilience and operability benefits of KIP‑848. Who is it for Platform and application engineers running Kafka at scale Developers who need custom partition strategies Engineers upgrading client applications to Kafka 4.x, migrating from from classic client-side assignors to broker-side assignors under KIP‑848

Lianet Margans

Confluent

David Jacot

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Life as a Kafka Admin: Lessons from Running 30+ Clusters in Production
Operating Apache Kafka in production is very different from just “using Kafka as a developer”. Since 2021, I’ve worked as a Kafka Admin responsible for more than 50 clusters across multiple regions, helping dozens of teams build on top of Kafka while keeping the platform stable and predictable. Over time, the patterns repeat: too many or too few partitions, services calling slow external APIs in the middle of stream processing, painful rebalances, clients that “cannot connect”, and users who just want the platform to “work” without learning all the internals.​ This talk shares the practical lessons learned from living in that world every day. It covers how to design and review topics and partitioning, how to deal with rebalances and skew, how to debug connection and authentication issues at scale, and how to build automations and guardrails that improve the developer experience for many teams at once. It also highlights what changes when you manage many clusters in different environments and regions, and how to keep your sanity while doing it.

Marcos Prado

SREENGINEER

This is some text inside of a div block.
,
This is some text inside of a div block.
Design with me : a Kafka Streams Payment Authorization Collaborative Design Session
Most Kafka Streams tutorials end where production begins. This interactive, detailed-design, whiteboarding session bridges that gap by designing a real-world payment authorization system: collaboratively, honestly, and without hiding the messy parts. We will start optimistically with Kafka Streams DSL approaches: elegant joins, straightforward windowing: and systematically discover their breaking points. As Apache Kafka Streams DSL solutions fail under real-world constraints, we'll architect production-grade alternatives using the Processor API, custom state store designs, and advanced patterns for handling late data, implementing timeouts, managing state lifecycle, and ensuring exactly-once semantics. We'll visually map out state management strategies, “watermarking” approaches, and the trade-offs between different join patterns when dealing with temporal uncertainty. Here’s the challenge (Inspired from one of my past engagements) ! Two asynchronous input streams: shiny new payment authorization requests arriving in real-timeaccount state updates that arrive delayed, because that is how the legacy integration actually works. The business requirement is straightforward: decide whether to authorize each payment. The technical reality? Anything but simple. This is not a lecture. Armed with a (virtual) whiteboard, we'll design the system architecture together (i’ll fill in the blanks !) progressively integrating new constraints that turn textbook examples into production systems: late-arriving state, authorization timeouts, authorized user override, state store growth. Attendees will leave with mental models for recognizing when business requirements exceed DSL capabilities, practical patterns for designing custom state stores that match their domain logic, and confidence in choosing between DSL and Processor API approaches. Most importantly: the architectural judgment I've earned through production failures—so you can succeed without repeating them.

Adam Souquieres

StreamConsulting

This is some text inside of a div block.
,
This is some text inside of a div block.
Schema Management in Kafka (with GitOps!)
Schema Management in Kafka is often perceived as introducing significant overhead. Even though using Schemas provides transparency and predictability to your Topics, some teams don't use it because of this perceived complexity. It doesn't have to be complex. I've introduced a similar approach in 2 projects and engineers grasp it really quickly once you have proper setup in place. Once you've seen it once you just get it. I want to share this with you. In this session I'll cover important problems and choices you'll need to understand when using Schemas in Kafka:- Why use Schemas and when it isn't a good idea- How does Schema Evolution work and how to choose Compatibility Type- How and when to publish multiple Event types to the same topic- How to embrace GitOps in Schema Management- What tooling is available- What potential issues you might encounter

Jan Siekierski

Kentra

This is some text inside of a div block.
,
This is some text inside of a div block.
Your Model Is Fine. Your Context Is Broken.
AI systems don’t fail because models are weak. They fail because context is wrong. As agentic AI moves from experimentation to production, teams are discovering a new class of data problems. Context is scattered across operational databases, event streams, APIs, and vector stores. It’s stale by the time it reaches the model, inconsistent across tools, and expensive to recompute. Most architectures were never designed to continuously assemble and serve context at runtime. This talk introduces context engineering as a practical, systems-level discipline focused on solving these data challenges. Rather than treating context as static input, context engineering treats it as a continuously computed product, derived from live business signals, enriched in real time, governed, and served with low latency to AI systems. We’ll focus on the role of streaming and event-driven architectures as the foundation for this approach. You’ll see why batch pipelines and warehouse-centric designs struggle with agent workloads, and how stream processing enables data enrichment and reprocessing of context as data evolves. In the second half, we’ll build this live. Using Kafka and Flink, we’ll construct a real-time context pipeline that ingests multiple data sources, enriches and materializes them into low-latency tables, and exposes them to AI agents through MCP. This session is for engineers who want to move AI systems out of POCs and into production by fixing the data foundation first.

Sean Falconer

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Diskless but with disks, Leaderless but with leaders: A KIP-1163 Deep Dive
KIP-1150: Diskless Topics promises to make Apache Kafka more cost effective and flexible than ever before, but how does it work? Where does the cost savings come from? Is it really Diskless? What about Leaderless? Why is the latency worse? This talk will walk through the design for the preferred implementation in KIP-1163: Diskless Core, and answer all of these questions. A basic understanding of Apache Kafka is enough to attend this talk: we’ll review the architecture used for classic and tiered topics, and how data is produced and fetched. We'll discuss the limitations of this architecture in the context of modern hyperscaler cloud deployments, and where the costs become excessive. Then we’ll show how the basic components of Kafka are taken apart and reassembled to build the Diskless architecture. We’ll also discuss the major rejected alternatives, and compare KIP-1163 to similar KIPs working to solve the same problem. At the end of this session, you should feel confident talking to stakeholders and community members about this amazing upcoming feature! By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Greg Harris

Aiven

This is some text inside of a div block.
,
This is some text inside of a div block.
Real-Time Feature Engineering at Scale: Chaining Features and Inference with Chronon
Modern machine learning applications demand features computed in near real-time while maintaining low-latency serving — a challenge that becomes exponentially harder at scale. This talk explores Chronon, an open-source feature platform battle-tested in production at Stripe, Airbnb, Netflix, and OpenAI, and how it bridges the gap between streaming data infrastructure and production ML systems. Traditional feature engineering pipelines force teams to choose between freshness and latency, leading to complex dual pipeline architectures that are expensive to maintain and prone to training-serving skew. Chronon solves this by providing a unified abstraction over batch and streaming computation, enabling teams to define features once and serve them with sub-100ms latencies while keeping them updated in near real-time. We'll demonstrate how Chronon can be used in a wide variety of ML applications such as real-time fraud prevention as well as more complex use-cases that require chaining feature computation with model inference / embedding pipelines such as two-tower search recommendation systems. Additionally, we'll explore how Chronon minimizes computation in the serving hot-path for these use-cases, reducing infrastructure costs by orders of magnitude compared to naive streaming implementations. Audience Takeaways: How Chronon unifies batch and streaming feature computation Chronon's pluggable architecture with respect to table formats, streaming buses, KV stores and model platforms Chronon's approach to minimize serving latency while maximizing feature freshness in production ML systems How one can build ML pipelines that chain feature computation with model inference / embedding for applications such as two-tower recommender systems Real-world lessons from companies serving billions of predictions daily This talk sits at the intersection of data streaming and AI in production, making it ideal for ML engineers, data platform teams, and anyone building real-time intelligent applications.

Piyush Narang

Zipline AI

This is some text inside of a div block.
,
This is some text inside of a div block.
Lambda Architecture in 2025: Kafka, Views, and the Evolving Data Platform
Lambda architecture is not dead. At Fresha, we serve ~1M daily bookings through a streaming platform that has evolved for over two years, and we are just getting started. This talk shares our journey of building a cost-effective, production-ready data platform on Kafka, Snowflake, and now Iceberg and StarRocks. Pillar 1: Ingestion - Simple but SolidFrom PostgreSQL to Debezium to Kafka to Snowpipe. Data lands in Snowflake in under 2 seconds. This layer has remained untouched since day one, and that stability enabled everything else. Pillar 2: Consolidation - Cost EffectiveHere is where Lambda architecture shines. We materialize tables every 20 minutes, then merge live CDC events at query time through views. This provides deduplication, schema evolution handling, and near-real-time freshness without running expensive compute 24/7. The pattern is old. It works. Pillar 3: Consumption - The Clever BitHere is what we are proud of: we use Snowflake as an API to support production load, which Snowflake is not designed for. Through smart architecture (connection pooling, query optimisation, view-based routing), we achieve Enterprise-tier capabilities on a non-Enterprise Snowflake plan. When we needed more, we extended with StarRocks and Iceberg - not replacing Snowflake, but complementing it. What you will learn:- Implementing query-time deduplication in Snowflake with dbt and views- Lambda architecture patterns that handle schema evolution gracefully- How to push Snowflake beyond its intended use case without breaking the bank- Extending your platform with Iceberg and StarRocks while keeping Snowflake in the mix The takeaway: You do not need the most expensive tier to build a production-grade streaming platform. Smart architecture beats premium licensing. 2+ years in production. Real patterns. Real cost savings.

Emiliano Mancuso

Fresha

This is some text inside of a div block.
,
This is some text inside of a div block.
From Weeks to Seconds: Real-Time ML Quality Control for Medical Device Manufacturing
Medical device manufacturers face a critical challenge: how to scale production 5x while decoupling quality control costs from volume growth. Traditional sampling-based quality control - taking samples every four hours, with results arriving days to weeks later - cannot support this ambition. This talk shares our journey building a real-time ML quality control system that analyses every injection moulding shot in under one second, predicting part dimensions within 10μm accuracy. Core Theme: This session demonstrates how streaming data platforms transform traditional manufacturing quality control from reactive sampling to proactive, real-time decision-making. You'll see how we built a production-grade system that processes sensor data from injection moulding machines across global manufacturing sites, enabling immediate quality insights without touching the machines themselves. Technical Implementation: We built a hybrid on-premise and cloud architecture handling real-time sensor data streams. The system captures sensor data directly from the machines, sends them to the on-premise Kafka deployments, from where the ML models (deployed using Apache Flink) deliver predictions to shop floor operators in under one second end-to-end. I'll share our architectural decisions, the challenges of maintaining sub-second latency at scale, and how we validated ML model accuracy against precision measurement equipment. The Journey & Key Learnings: Rather than presenting a polished success story, I'll walk through our iterative hypothesis-testing approach—what worked, what failed, and why. I'll discuss the human factors: building trust through transparency, involving shop floor workers in the design process, and navigating medical device manufacturing regulations. Audience Takeaways: Attendees will learn practical patterns for implementing real-time ML in industrial environments, strategies for iterative validation of streaming system assumptions, and how to bridge the gap between data science prototypes and production-grade systems that non-technical users trust and adopt.

Samuel von Baußnern

D ONE – Data Driven Value Creation

This is some text inside of a div block.
,
This is some text inside of a div block.
Building Intelligent Systems on Real Time Data
Confluent CEO Jay Kreps takes the stage alongside industry leaders at data streaming’s biggest event. Together, they’ll show why free-flowing, real-time data has become the key to unleashing the full potential of intelligent systems across every business. From live demos to real-world use cases to industry-changing product announcements, this year’s keynote is essential viewing for anyone looking to maximize the potential of their AI. Which is pretty much everyone. Don’t miss it.

Jay Kreps

Confluent

Shaun Clowes

Confluent

Sean Falconer

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Handling Surges in Petabyte-Scale Streaming Systems by Doing Nothing
When streaming data at petabyte scale, one of the most painful on-call scenarios is handling sudden traffic surges that overload servers, trigger cascading failures, and wipe out service availability across a large blast radius. At modern throughput levels, scaling operations are simply not fast enough to prevent unexpected 10–20x spikes from taking down dozens of streaming pipelines and their neighbors. The classical mitigation is to overprovision replicas and headroom, add proactive alerting, and hope to “react quickly.” In this talk, we present a TCP-based congestion control approach that tackles the problem at its root and eliminates the need for manual on-call intervention. At Pinterest, we have productionized this TCP-based flow control solution in a 50 GB/s streaming system that powers machine learning across the company. By setting up the appropriate end-to-end flow control mechanisms, we guard against sudden surges of any magnitude by propagating backpressure gracefully, predictably, and fully autonomously. We will walk through the key concepts in networking, memory management, and backpressure that matter in large-scale streaming systems, and then unpack the exact mechanism we built to solve this problem. The audience will leave with a set of production-ready ideas and patterns that can be replicated in their own streaming environments in ways that are far more cost-efficient and operationally lightweight than classical solutions. Beyond eliminating the catastrophic risk of sudden traffic surges, we will share concrete and replicable takeaways from running these concepts in production at scale, including: Designing streaming topologies that rely on backpressure instead of excess capacity Safely transforming scaling and load balancing into reactive operations, reducing unnecessary early alerting and interventions Simplifying capacity planning for organic growth Lowering infrastructure cost by running denser workloads with minimal buffer headroom

Jeff Xiang

Pinterest

This is some text inside of a div block.
,
This is some text inside of a div block.
Dynamic Kafka, Static Sleep: Taming Multi-Cluster Streams with Flink at OpenAI
At OpenAI, Kafka streams don’t sit still: a single logical “stream” can span multiple clusters and sometimes multiple regions, and the underlying topology changes as we migrate, scale, or fail over. That’s great for availability—but it’s a sharp edge for stream processors that assume “one cluster, stable topics, one offset story.” (Spoiler: that assumption dies first.) This talk shares our journey at OpenAI to make Apache Flink’s DynamicKafkaSource fit that reality, using our Kafka to Warehouse ingestion system “StreamLink” built on Flink as a case study. We’ll walk through the mental model shift from “topics on a cluster” to “a stream over an ever-changing infra topology,” what worked, and where we ran into the most interesting edge cases—around offsets, state, and operational safety when Kafka topology evolves underneath a running Flink job. Rather than presenting a polished fairy tale where every checkpoint is happy and every offset is deterministic, we’ll focus on the decisions and tradeoffs: the approaches we considered, the guardrails we’re putting in place, what we’re validating, and the questions we think the community should care about as dynamic consumption of Kafka becomes more popular. We’ll also cover what we’re contributing back to OSS across core implementations and APIs (Java/Python/Table/SQL), and a practical roadmap. You’ll leave with patterns you can apply to multi-cluster Kafka + Flink deployments, a checklist of “gotchas” to watch for, and a few ideas you can steal — because if Kafka is going to be dynamic, your consumption strategy should be too (preferably without becoming dynamically on-call)

Bowen Li

OpenAI

Xin Gao

OpenAI

This is some text inside of a div block.
,
This is some text inside of a div block.
Turning the database inside out again: What if everything was Iceberg?
Over a decade ago, Martin Kleppmann's Turning the Database Inside Out reshaped how we think about data systems, putting the event stream at the heart of storage and computation. That vision inspired a generation of systems built atop Kafka, Flink, and event-driven materializations. But what if we never finished what Martin started? This talk takes the next leap, reimagining not just the transaction log, but the entire database through the lens of streaming. We'll keep Kafka as our canonical source of truth, but enrich it with the missing primitives: long-term storage, indexes, and projections. To achieve this, we'll move beyond Kafka's simple produce/consume model and embrace Apache Iceberg as the new foundation for durable, queryable event data. This architecture collapses the fragile ETL sprawl and unifies real-time and historical data into a single, coherent system. Answering questions from "what's happening right now?" all the way back to "what happened at the beginning of time?". You'll leave seeing the database (and the stream) in a whole new light. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Tom Scott

Streambased

This is some text inside of a div block.
,
This is some text inside of a div block.
From Blind Spots to Full Visibility: Kafka Observability with OpenTelemetry
In the world of modern finance, “We'll check those logs later” just doesn’t fly. At Fidelity, every Kafka event supports a regulatory audit, an operational workflow, or a customer’s investment-here observability isn’t a nice-to-have; it's a compliance requirement. But keeping a watchful eye on streaming data at enterprise scale isn’t for the faint of heart, especially when you want reliability, transparency, agility, and a good night's sleep for your SREs. In this session, we’ll walk through how Fidelity has engineered an enterprise-grade observability platform for Kafka that brings together real-time monitoring, unified metrics, and interactive dashboards. By leveraging OpenTelemetry for vendor-neutral data collection, Grafana for dynamic visualization, and OpenSearch for comprehensive log analysis, Fidelity has built a robust observability stack that keeps a vigilant eye over every Kafka stream. Attendees will walk away with insights on conquering compliance hurdles, supporting rapid incident response, and designing observability with enterprise reliability in mind. If your goal is to modernize legacy monitoring or embrace open-source culture like a fintech pro, this session offers a blueprint for building scalable Kafka observability.

Evan Kelly

Fidelity Investments

Manish Dusad

Fidelity Investments

This is some text inside of a div block.
,
This is some text inside of a div block.
Streamiz: Bringing Native Kafka Streams to the .NET Ecosystem
Building real-time streaming applications in .NET? You’ve probably hit the wall of limited options and wondered why the JVM ecosystem gets all the love with Kafka Streams.Enter Streamiz - a powerful .NET library that brings stream processing capabilities directly to your C# applications. But does it live up to the hype?In this session, we’ll dive into: - Live coding a real-time data pipeline with Streamiz- Comparison with Kafka Streams- When to choose Streamiz vs. other streaming solutions Through hands-on demos and honest technical analysis, you’ll walk away knowing exactly whether Streamiz deserves a place in your streaming architecture.Perfect for .NET developers tired of being second-class citizens in the streaming world!

Wllem Surreyus

Cymo

This is some text inside of a div block.
,
This is some text inside of a div block.
Who Let the Agent In? Securing MCP Servers in Production
The Model Context Protocol (MCP) is reshaping how agents interact with tools and APIs, but building MCP servers that are secure, governed, and production-ready is still a challenge. Many teams want to expose powerful capabilities through MCP, yet struggle to implement authentication and authorization that follow the MCP specification while staying flexible for real-world use cases. This talk focuses on how to implement MCP-spec-compliant authentication and rich authorization models for your MCP servers without unnecessary complexity. We will start with a clear overview of how MCP handles identity and access. After that, we will walk through a minimal MCP server implementation. Once the basics are in place, we will add standards-aligned authentication and explore techniques for fine-grained and contextual authorization using OpenFGA. The session will also connect these patterns to real-world data streaming and API governance scenarios, where multiple services, tools, and agents require controlled access to event streams, schemas, or domain-specific operations. As enterprises adopt agent-driven architectures, securing access to streaming systems becomes increasingly important. To wrap up, we will look at solutions that can provide the same authentication and authorization capabilities, including FGA-style access control, through a fully managed and no-code approach. This lets you focus on building MCP servers instead of maintaining multiple security layers. Audience Takeaways: A practical understanding of MCP authentication and how to implement it correctly A reference design for fine-grained authorization for MCP using OpenFGA Patterns for governing access to streaming systems and APIs exposed through MCP How to offload the entire security layer to Gravitee without writing any additional code in your MCP server Actionable guidance you can apply immediately when building your own MCP servers By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Prachi Jamadade

Gravitee

This is some text inside of a div block.
,
This is some text inside of a div block.
The Missing Piece in the Kafka Stack: Durable Functions for Event-Driven Apps and AI Agents
Kafka solved the hard part of event-driven architecture: a scalable, durable, replayable log with strong delivery guarantees. Stream processing (Kafka Streams, Flink) then made analytics and continuous computation first-class. Yet when teams use Kafka to build event-driven applications - async or long running application logic - they still end up rebuilding the same reliability and correctness mechanisms: idempotency, retries, timers, state management, sagas/compensation, and exactly-once interactions across services. The result is a complex glue layer of infrastructure that’s difficult to reason about and hard to operate. This talk shows how Restate complements the Kafka stack and provides the missing runtime layer for application and agent workloads. Restate takes the event-log idea, but flips the unit of abstraction from events to durable function invocations: a handler call becomes a persistent, resumable process with exactly-once semantics for execution, state, and service-to-service communication. Instead of stitching together consumers, databases, outboxes, schedulers, and workflow engines, developers write ordinary code, while Restate transparently persists progress, deduplicates, retries safely, and supports durable RPC, callbacks, and long waits. We’ll walk through two concrete patterns: From Kafka to a Durable Function handler and multi-step orchestration (e.g., payments or order fulfillment style flows) Durable AI loops: tool-using agents that pause for human input, recover from partial failures, and remain observable and controllable Finally, we’ll cover operational advantages: fine-grained introspection into each invocation and the ability to pause/resume/cancel/retry individual executions, thus turning “black box” event flows into debuggable, operable application processes.

Stephan Ewan

Restate

This is some text inside of a div block.
,
This is some text inside of a div block.
Breaking Kafka at Scale: Lessons from Running 70K Topics on a Single Cluster
Breaking Kafka isn’t that hard, deploying 70K topics on a single cluster will certainly do the trick. High availability quickly triples the blast radius, pushing past the 200K partition stability threshold. At this scale, stability becomes fragile, and keeping production alive feels more like firefighting than engineering. In this session, we’ll share our real-world Kafka journey: a technical migration from an aging, single-tenancy architecture to a massively scaled, multi-tenant platform. We'll detail how we engineered this platform to handle billions of events per day, power a super-fast UI, and maintain real-time replication underneath. We will dive into the internals of our overwhelmed Kafka cluster, showcasing how we utilized Kafka Connect and Debezium running on Kubernetes to replicate customer data from MySQL to SingleStore in under 10 seconds. Finally, we’ll share the concrete, quantifiable outcomes: an 80% reduction in Kafka infrastructure costs and the elimination of entire classes of stability issues. This talk is packed with practical lessons, architectural trade-offs, and hard-earned insights. It is ideal for Intermediate to Senior Data Engineers, Architects and teams operating Kafka at scale (on-prem or cloud) facing cost, performance, or stability challenges.

Ziv Fridfertig

Skai

This is some text inside of a div block.
,
This is some text inside of a div block.
Defending the Perimeter: Patterns for Secure External Event Exchange
In the era of the "Connected Enterprise," data doesn't just stay inside your private network. You need to share real-time logistics with partners, stream live telemetry to mobile apps, and ingest events from third-party vendors. However, exposing your Kafka brokers directly to the internet is a major security risk. Traditional firewalls and REST-based API Gateways are ill-equipped to handle the persistent, bi-directional, and high-throughput nature of event streams. This session introduces the concept of the "Event Perimeter"—a dedicated architectural layer designed to facilitate secure event exchange. We will analyze the Event Gateway as a "Smart DMZ" that provides an air-gap between your internal event mesh and the outside world. We will dive deep into technical patterns for Zero Trust Streaming, including how to move authentication and authorization logic from the broker level to the edge. A significant portion of the talk will focus on Policy Enforcement. We will demonstrate how to integrate an Event Gateway with solutions of the ecosystem to perform fine-grained "Content-Based Access Control." This allows you to dynamically redact PII fields or filter specific events based on the consumer's identity before the data crosses the perimeter. Whether you are dealing with GDPR compliance or simply protecting your brokers from accidental DDoS, this session provides a vendor-neutral framework for secure streaming. Key Takeaways: The Air-Gap Pattern: Architecting a "Smart Proxy" to isolate your internal Kafka clusters.Fine-Grained Security: Using ecosystem solutions and the Gateway to redact sensitive data in real-time.Operational Safety: Implementing rate limiting, quotas, and circuit breakers specifically designed for event-driven traffic.

Hugo Guerrero

Kong

This is some text inside of a div block.
,
This is some text inside of a div block.
A Hitchhiker’s Guide to Apache Kafka Data Migrations
Data migration in the world of Apache Kafka can seem like a daunting journey. This session is designed to guide you through the complexities of migrating data in and out of Kafka clusters, drawing from real-world use cases and hands-on experience. We will begin by exploring several migration scenarios, shedding light on practical challenges and solutions that organizations commonly face. This will provide a strong foundation for understanding the core principles of Kafka data migrations. Next, we’ll dive deep into the techniques and methods available for efficient and effective data migrations in Kafka environments. From leveraging Kafka Connect to managing schema evolution, we’ll cover tools and strategies that ensure smooth data transfer and minimal downtime. Throughout the session, we will also highlight the common pitfalls encountered during Kafka migrations and discuss actionable solutions to overcome them. Armed with this knowledge, you'll be equipped to avoid mistakes that can lead to costly setbacks. To wrap up, we’ll share best practices for Kafka data migrations, enabling you to optimize your migration strategy and ensure its success. And, as a bonus, we'll explore why, sometimes, the answer to Kafka migrations is simply “42.”

Michael Muehlbeyer

This is some text inside of a div block.
,
This is some text inside of a div block.
Distilling Kafka’s Binary Protocol into Elixir
Kafka’s wire protocol evolves fast: dozens of APIs, versioned schemas, and “flexible versions” (KIP-482) with compact encodings and tagged fields. For most client ecosystems, keeping up means either dragging in a large Java stack or hand-maintaining a sprawling protocol layer. In the BEAM world, kafka_ex takes a different route. It relies on Kayrock, which treats the Kafka protocol as data, not handwritten code. At compile time, Kayrock.Generate loads the upstream Erlang :kpro_schema and expands it into pure Elixir modules: typed request/response structs plus serializers/deserializers for every supported API version. Flexible versions and tagged fields are handled via AST generation and a small set of pattern-matching helpers, so the “weird bits” live in one place. The payoff is operational: adding support for a new Kafka release is mostly update :kpro_schema → run a Mix task → commit generated .ex files. No JNI. No JVM dependency. No rewriting protocol logic by hand. Elixir applications still work with normal structs, while the protocol stays correct and current. I’ll walk through the generator pipeline end-to-end, show a concrete example of one Kafka API across versions, and share the design tradeoffs (what’s generated vs handwritten) so you can apply the same pattern to other fast-changing binary protocols.

Anton Borisov

Fresha

This is some text inside of a div block.
,
This is some text inside of a div block.
Sizing, Benchmarking and Performance Tuning Apache Flink Clusters
A common question when adopting Apache Flink is about sizing the workload: How many CPUs, how much memory will Flink require for a particular use-case? What throughput and latency can you expect given your hardware? We’ll kick off this talk discussing why these questions are extremely difficult to answer for a generic stream processing framework like Flink. But we won’t stop there. The best approach to answer sizing questions is to benchmark your Flink workload. We will present how we’ve set up a Flink SQL-based benchmarking environment and some benchmarking results for attendees to correlate our results with their workloads to approximate their resource requirements. Naturally when benchmarking, the topic of performance tuning comes up: Are you optimally using the allocated resources? How to identify performance bottlenecks? What are the most common performance issues, and how to resolve them? In our case, a few configuration drastically improved the performance. How many CPU cores are needed for that in Flink? Attend the talk to find out, less than you would expect. This talk is for both Flink beginners wanting to get an idea about Flink’s performance and operational behavior, as well as for advanced users looking for best practices to improve performance and efficiency.

Robert Metzger

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Streaming AI/ML with Apache Kafka: Real-Time Patterns for Modern Intelligence
AI and Machine Learning systems only create value through the data that feeds them. As environments change continuously, static datasets and offline pipelines are no longer sufficient. Modern AI/ML requires real-time data streams to remain accurate, responsive, and trustworthy. This session explores how Apache Kafka acts as the central event backbone for AI/ML, connecting data producers, feature pipelines, training processes, inference services, and feedback loops. Kafka enables event-driven AI/ML systems that evolve alongside the data they observe. The talk introduces a set of AI/ML streaming patterns that show how Machine Learning workloads can be built around continuous ingestion rather than periodic batch jobs. It covers patterns for streaming data ingestion and preprocessing, live features extraction, and training models directly from event streams, including near-real-time retraining and online learning approaches. It also explores real-time inference and feedback-loop patterns, where predictions and outcomes are streamed back into Kafka for monitoring and model drift detection, as well as LLM-oriented patterns where Kafka provides continuously refreshed context for Retrieval-Augmented Generation (RAG) architectures. By focusing on architectural patterns rather than isolated tools, the session highlights how data timeliness, ordering, and flow directly shape the effectiveness of Machine Learning systems. Attendees will leave with a practical pattern-based toolkit for building real-time AI/ML systems: designing feature pipelines, implementing online learning, integrating feedback loops, and supporting advanced applications such as LLM-driven RAG workflows.

Paolo Patierno

IBM

This is some text inside of a div block.
,
This is some text inside of a div block.
Life in the Slow Lane: Cost-Efficient Streaming Through Latency Tiering
Everyone loves real-time data… until the cloud bill arrives. At Wix, we stream over 40 billion events a day through our analytics pipeline, and for years, everything flowed through one giant, expensive, ultra-fast Kafka setup. Because “real-time is always better,” right? Well… no. In this talk, we’ll challenge one of the streaming world’s favorite myths: that faster is always better. We’ll show how we broke our single monolithic pipeline into a multi-lane architecture - where each lane is optimized differently for varying latency and cost requirements. Along the way, we’ll explore what “fast” really means for different use cases, why one size rarely fits all and why the new generation of “diskless” Kafka changed the game on the ability to optimize your streaming stack. If you’ve ever wondered how to balance latency, cost, and sanity in large-scale event systems, or if you just want to hear how we managed to make Kafka slower on purpose while also saving a few bucks, come join the ride. Your cloud bill may thank you.

Josef Goldstein

Wix

This is some text inside of a div block.
,
This is some text inside of a div block.
Chaos to Golden Path - How FanDuel's Eventing Strategy Transformed Enterprise Event Streaming
When 30% of your engineering time is spent on non-value-added data tasks, it's time for radical change. This talk chronicles FanDuel's ambitious Streamlined Event Acquisition Strategy (SEAS) - a company-wide migration from fragmented, team-specific solutions to a unified Kafka/Flink-based event streaming platform that now processes billions of events daily. We'll explore how our Data Team led the transformation from a world where every team reinvented the event publishing wheel to a standardized "golden path" that reduced time-to-insight by 60% while cutting infrastructure costs significantly. The journey wasn't just technical - it required cultural change, cross-team collaboration, and careful change management across dozens of product teams. Key topics covered: The business case that drove SEAS: quantifying the hidden costs of data fragmentation Designing the golden path: standardized event formats, schema evolution, and strong data contracts Migration strategies that kept production systems running during the transition Building language-agnostic SDKs and tooling that made adoption effortless Measuring success: from engineering velocity to data quality improvements Building the dream-team that can drive this transformation forward Real-world examples include migrating our high-volume betting systems during football season, handling schema evolution for legacy integrations, and the monitoring strategies that prevented data quality disasters. Attendees will learn practical frameworks for driving enterprise-wide streaming standardization in complex, multi-team environments.

Tony Cui

Fanduel

Alexandru Barbu

Fanduel

This is some text inside of a div block.
,
This is some text inside of a div block.
Bridging Stream and Queue: Protocol Enhancements For Kafka's Share Groups
The introduction of Kafka Share Groups fundamentally re-architects Kafka to decouple consumer scaling from partition count, enabling queue-like consumption over standard topics. This talk focuses on the essential protocol enhancements introduced in Apache Kafka 4.2 that bridge the gap to queue semantics. Specifically, will detail, KIP-1206: Record Limit and Batch Optimized Behaviour, which improves the records delivery mechanism to enable work-queue-like workloads; KIP-1222: Records Lease Renew, which provides a mechanism for applications to process records which takes longer processing time; KIP-1226: Share Lag Computation, which enables the auto scaler to monitor the share lag for the group to manage horizontal scaling. Attendees will gain a clear, actionable understanding of the resiliency associated with long-running tasks, workload optimization and operational scaling associated with the share groups.

Apoorv Mittal

Confluent

Andrew Schofield

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Enterprise ready with the Flink HTTP Connector
Have you ever wished you could handle problematic events in Flink SQL as easily as with DataStream side outputs? Imagine routing unprocessable records—such as those failing serialization—straight to a dead-letter Kafka queue without stopping your job. The new Apache Flink HTTP connector makes this possible while unlocking even more capabilities. It allows you to treat API endpoints as dynamic Flink tables, enabling seamless integration with any technology that exposes APIs—without writing custom code. For example, you can connect to your favorite AI endpoint simply by declaring a Flink SQL table. In this session, you’ll learn how to leverage the HTTP connector to keep your Flink jobs running even after exceptions, HTTP error codes, or deserialization failures. We’ll explore how its new metadata columns provide powerful tools for error handling and observability. You’ll also discover best practices for tuning the connector for enterprise scenarios, including caching strategies, security configurations, and retry mechanisms. Key Takeaways: - How to integrate APIs into Flink SQL with zero custom code - Techniques for handling errors gracefully and improving resilience - Using metadata columns for better monitoring and debugging - Enterprise tuning tips: caching, security, and retries Get ready to make your Flink pipelines more resilient, scalable, and enterprise-ready with the HTTP connector!

David Radley

IBM

This is some text inside of a div block.
,
This is some text inside of a div block.
How Datadog Runs Its Streaming Platform
Operating Kafka in production is hard. Operating thousands of Kafka clusters globally—without customer-visible incidents—is an entirely different problem. At Datadog, Kafka is the backbone of our real-time data ingestion and streaming platform, processing petabytes of data every day across thousands of clusters and tens of thousands of brokers. At this scale, failures are rarely loud or localized. Instead, they surface as subtle latency shifts, uneven consumer lag, stalled rebalances, or slow partitions that quietly degrade customer experience if not caught early. The hardest part is not detecting symptoms—it’s identifying root causes fast enough to prevent impact. Standard Kafka monitoring (JMX metrics, broker health, consumer lag) breaks down when incidents span multiple clusters, teams, and regions. This talk explores how Datadog runs a massive Kafka fleet in production while minimizing incidents and customer impact, and the observability practices that make this possible. Through real production scenarios, we’ll show how we correlate signals across brokers, consumers, storage layers, and infrastructure to understand why something is wrong—not just that it is. We’ll dive into the technical foundation behind this approach: Partition-level throughput and latency analysis to detect emerging hot spotsContinuous profiling to identify GC and allocation issues before they affect tail latencyDistributed tracing to follow slow produce and fetch paths across services and clustersDynamic instrumentation to debug live Kafka services safely, without redeploymentsFleet-wide dashboards, anomaly detection, and SLOs to prioritize issues that matterBeyond tooling, we’ll share the operational patterns we rely on to keep Kafka stable at scale: detecting configuration drift across thousands of clusters, preventing cascading failures, and shifting from reactive firefighting to predictive capacity and risk management. This session is for engineers running Kafka in serious production environments who want to understand what it takes to operate streaming systems at global scale—and how modern observability enables reliability when failure is the default state.

Nandini Singhal

Datadog

This is some text inside of a div block.
,
This is some text inside of a div block.
Getting Started with Apache Flink: Essential Patterns and Best Practices
This session provides a comprehensive introduction to Apache Flink for developers and architects who seek to build streaming solutions that are resilient, efficient, and maintainable. I will move through three critical layers of Flink development: 1. Establish a solid foundation based on well-engineered data products You will learn best practices for: Managing formats and schemas for the long term. Ensuring data integrity and implementing error handling. Working with streams of immutable records vs. streams with updates. Handling the nuances of watermarking and late-data strategies. 2. Compose solutions from event streaming patterns Rather than writing monolithic scripts, I will show you how to decompose complex problems using reusable components based on these design patterns: Deduplication: removing duplicate events Correlation: linking related events across streams (e.g., orders and their shipments) Aggregation: computing real-time analytics Enrichment: adding context to events from reference data Pattern matching: detecting sequences or anomalies in event streams 3. Insist on operational excellence Finally, I ground the technical theory in operational reality, and discuss the fundamentals that will help ensure that your application scales without breaking the bank or the cluster. You will learn how to: Manage state mindfully and prevent indefinite state growth. Navigate hidden costs by understanding the trade-offs and limitations inherent in some common situations. Guarantee quality by creating solutions that can be tested, maintained, and evolved. Key takeaway: Whether you are a newcomer to Flink or looking to improve your existing streaming platform, you will walk away with a practical checklist and a library of patterns to build data products that are as resilient as they are performant.

David Anderson

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Keeping data private in real-time pipelines
We all love real-time data — clicks, payments, rides, messages — but most of it comes with a catch: it contains personal information we’re not supposed to leak, such as names, emails, locations, or even small clues that can identify someone. The challenge: how do we keep streaming data useful and safe at the same time? In this talk, we’ll explore practical ways to protect privacy in streaming systems using Apache Kafka, Apache Flink, and Apache Iceberg. We’ll cover:- simple tricks like masking and tokenizing PII;- why “anonymous” data often isn’t anonymous (the re-identification problem);- techniques like bucketing, k-anonymity, and adding noise;- how to balance privacy with data utility (too much hiding makes data useless). Along the way, we’ll look at real-world stories: from public data leaks to surprising deanonymization attacks, and show live demos of pipelines that anonymize data before it’s written to storage.If you’ve ever wondered how to build privacy-aware pipelines, this talk will give you practical patterns you can use right away.

x

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Streaming CDC to Apache Iceberg at Scale with Apache Kafka: Best Practices for Enterprise Lakehouse Architectures
In today's data-driven enterprises, the ability to efficiently stream change data capture (CDC) events from operational databases into analytical platforms has become a critical capability. This session explores the architectural patterns and operational best practices for building robust, scalable CDC pipelines that deliver data to Apache Iceberg using Apache Kafka as the streaming backbone. As organizations increasingly adopt lakehouse architectures for their analytical workloads, the challenge shifts from simply moving data to doing so optimally and at scale. This session provides practical guidance on setting up end-to-end CDC streaming pipelines, covering key considerations such as schema evolution handling, partition strategy optimization, compaction policies, and write performance tuning specific to Iceberg tables. Attendees will learn proven techniques for managing high-volume CDC streams, including strategies for handling late-arriving data, managing small file problems, optimizing merge operations, and implementing effective monitoring and alerting. We'll also discuss critical operational measures needed when scaling these pipelines to handle enterprise workloads, including resource allocation, backpressure management, and ensuring data consistency across distributed systems. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Vinayaka Gangadhar

AWS

Yashika Jain

AWS

This is some text inside of a div block.
,
This is some text inside of a div block.
A peek under the hood of Confluent for VS Code
An ever growing number of developers discover the capabilities of data streaming platforms and apply them in their software projects. How can we make them more successful working with technologies, such as Apache Kafka or Apache Flink?Take a deep dive into Confluent's VS Code extension, which provides a delightful developer experience for data streaming projects. We’ll cover the main components of the extension’s architecture and discuss how they provide a seamless integration with Kafka and Flink. For instance, attendees will gain an insider's perspective on why we chose to run a GraalVM-powered sidecar executable and how it enhances our extension’s performance and capabilities. We will also share some of the notable challenges our development team encountered, and how we overcame them to deliver a robust user experience. Finally, we will highlight some of our newest features and improvements, and show how they support your data streaming projects from development through production.

Stefan Sprenger

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Header-aware state stores for Kafka Streams
Kafka record headers are increasingly used to carry critical metadata such as schema identifiers, correlation IDs, tracing information, and feature flags. In modern event-driven architectures, these headers are also the primary vehicle for propagating distributed tracing context (trace IDs, span IDs, and causality metadata) across asynchronous boundaries. Today, Kafka Streams state stores ignore this metadata and only persist key and value bytes, which means any header-based semantics and tracing context are lost as soon as a record passes through a stateful operator and gets materialized. This makes it hard to use header-aware serdes consistently, breaks end-to-end traces at stateful boundaries, limits downstream processors that rely on headers to drive behavior, and prevents Interactive Queries from exposing header-level observability when headers are part of an application’s protocol. This talk introduces KIP‑1271, which proposes header-aware state stores for Kafka Streams as a building block for robust end-to-end tracing in Streams applications. We’ll look at the new header-preserving store types and how they embed serialized headers alongside values while keeping the existing key/value abstraction intact. You’ll see how applications can opt into these stores via the new *WithHeaders factories, how header-aware serdes plug in so that tracing and other metadata survive stateful operations and replays, and how upgrade is handled through a single rolling bounce with lazy migration of existing state. Attendees will leave with a clear understanding of when and how to adopt header-aware state stores to keep tracing context and other critical metadata intact end to end.

Alieh Saeedi

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
The GitHub for Streaming Data: Unlocking Open Data Streams
I am a developer. My day job is at IoT company Device Insight (we are known in the Kafka community for our open source tool kafkactl). On nights and weekends, I am working on my passion project, the Grand Central Message Broker (gcmb.io). This is a platform that is based on the following idea: Provide and consume streaming data in a collaborative manner. I made the following observation: In development, source code repositories used to be silos, only used in the context of a single company or project. With the advent of GitHub, things changed: There was now a space where collaboration could happen, code could be made available, searched for, re-used, developed together across individuals and organizations. You could argue that the streaming world is in a place where software development used to be. Data is created, handled and processed in silos. Which is fine, a lot of data is private to organizations and should not be shared. There is, however, streaming data that can be useful for a wider audience. This is primarily in the realm of Open Data. There is a lot of this around the world, however, most of it is in static datasets. My vision is to make Open Data available in a streaming manner. For this reason, I am building gcmb.io, a platform where you can easily share streams of Open Data and consume those provided by others. This makes it easy to combine different types of information and use them for data science or in applications. Examples for such data streams freely available on gcmb.io: 17 million airplane positions per day (ADS-B) from around the world A stream of Wikipedia edits (400k per day) Current energy data from various countries (energy production, consumption) Medium blog posts as they are published If you want to check out the project, it's live at https://gcmb.io. There you can find a list of featured projects (including the ones mentioned above) If given the chance to present, I would like to explain the general concept and how the data can be ingested into Kafka and Confluent Cloud (did I mention that gcmb .io has native Kafka integration?)

Stefan Hudelmaier

Device Insight GmBH

This is some text inside of a div block.
,
This is some text inside of a div block.
The "Plug & Play" Lie: Why Your Oracle CDC Pipeline Will Fail
Surviving unbounded numerics, missing SMTs, and XStream configuration hell. We are often promised that Change Data Capture (CDC) is a solved problem: "Just install the Debezium connector, point it at Oracle, and stream." In reality, connecting Debezium to Oracle XStream is not the end of the journey, it is merely the start of a complex engineering challenge. We will share why without a rigorous platform around it, a raw XStream implementation often leads to production outages, data corruption via type mismatches, and operational gaps. In this session, we will expose the missing pieces required to turn a raw Debezium connector into a resilient data pipeline. We will move beyond "Hello World" examples and dissect the painful realities of Oracle CDC. We will frame this discussion around our own architectural evolution: sharing the scars from our v1, the decisions that defined our current production v2, and the architectural features of a hypothetical v3 that we are still chasing. We’ll go deep on: The Type System Minefield: How to handle Oracle’s "Unbounded Numerics" and complex timestamps without crashing your consumers or losing precision (and why default SMTs aren't enough) Declarative Pipeline Generation: Why handwriting connector configs is a recipe for disaster. We will demonstrate using SpecMesh and pipeline definitions to auto-generate complex Debezium and SMT configurations. We use these definitions as collaborative contracts with domain teams, agreeing on schemas and intent upstream. Closing the Trust Gap: CDC without verification is just a best-guess. We will share an overview of our continuous reconciliation process that proves that the data matches the source of truth. Full-Stack Local Testing: Running Kubernetes, Oracle, Kafka, and your full pipeline on a developer machine. We’ll show how to test schema evolution and SMT logic locally, long before deployment Survival Mechanics: Deep dives into XStream position recovery, implementing heartbeat to prevent quiet tables holding on to redo logs for longer than expected and handling Confluent Cloud region failover without data loss This is a deeply technical, practitioner-focused session aimed at engineers and architects who are interested in migrating data from Oracle databases. You’ll come away with: a mental model of how Oracle XStream works, design patterns for building resilient pipelines, concrete tips for observability and performance tuning, and a set of “day two” operational checklists.

Kiril Piskunov

MarketAxess

Declan Curran

MarketAxess

This is some text inside of a div block.
,
This is some text inside of a div block.
Flink Beyond Streaming: Building a Production-Ready Batch Platform at LinkedIn
Apache Flink is widely known for streaming, but running Flink Batch as a reliable, repeatable “default” engine for critical offline workloads requires platform work that does not show up in typical examples. In this session, we will share how we productionized Flink Batch and Flink SQL for large batch pipelines-covering the engineering choices, operational guardrails, and lessons learned when scaling adoption at LinkedIn. We will start with the platform foundations needed to make batch SQL dependable in production: packaging and deployment patterns for batch SQL jobs, reducing configuration drift between job logic and orchestration, and the minimum observability you need to debug regressions quickly. Then we will go deep on concrete performance and scalability work: SQL query optimizations such as nested projection and filter pushdown to reduce compute and I/O. Remote shuffle with Celeborn to overcome shuffle bottlenecks and improve throughput predictability for the largest batch workloads. Workflow orchestration with Apache Airflow to schedule, monitor, and recover batch pipelines with minimal operator toil. Operational observability using Flink HistoryServer for post-job diagnostics and faster root-cause analysis. To make it real, the talk is anchored by two production “tales from the trenches”: Scalability and Reliability Training Data at Scale: Lessons from LinkedIn Ads We will walk through how we optimized a large machine learning model training data pipeline running on Flink Batch including changes in SQL planning, execution and shuffle architecture, and how these improvements enhanced runtime performance and operational stability. Also, will share before and after numbers to showcase the significant scaling improvements. Developer Experience and Maintainability Scaling Central Interaction Logging ingestion (online + offline)CIL is a central platform that provides a unified view of users' interactions across in online and offline environments so downstream systems - including AI models - can rely on a single consistent source. We will share the bottlenecks encountered when scaling onboarding to many near-identical SQL ingestion jobs: manual job/DAG scaffolding, fragile configuration wiring, schema-only testing, and recurring Avro/schema maintenance. Audience takeaways: a practical checklist for running Flink Batch at scale (query tuning, shuffle choices, orchestration, and observability), and patterns for onboarding many SQL jobs with less duplication, better testability, and safer schema/dependency evolution.

Archit Goyal

LinkedIn

This is some text inside of a div block.
,
This is some text inside of a div block.
The High-Performance Backbone: Benchmarking Your Enterprise for Streaming Success
For decades, enterprise strategy has been throttled by the "Data Mess"—a brittle web of point-to-point integrations and slow batch processes that create systemic friction and stall innovation. For the modern tech executive, the challenge is no longer just managing data at rest, but enabling the High-Velocity Enterprise: an organisation capable of making intelligent, data-driven decisions in the moments that matter. This session introduces the Data Streaming Readiness Framework, a diagnostic, socio-technical methodology designed to move organisations beyond the "Data Mess". Based on the results of many sessions with technology leaders, we move past technical hype to address the core strategic pillars of streaming readiness: Value from Data, through a Unified Platform, guided by Purposeful Adoption. In this talk, we dive into the architectural shifts required to dissolve the wall between operational and analytical planes. We will discuss: Data as a Product: Transforming "just data" into intentionally shared, discoverable, and governed assets with clear ownership. The Internal Developer Platform (IDP): Providing a "golden path" for engineers that abstracts technical complexity and automates global guardrails. Decentralized Accountability: We shift from over-burdened central teams to a Platform Team while Business Domains take full ownership of their data. We share benchmarks and "Tales from the Trenches" from global leaders, including: How a Global Retailer consolidated 4 disparate estates to remove legacy integration debt and save six figure sums annually. How an Automotive Manufacturer institutionalised value governance, linking streaming outcomes directly to enterprise KPIs and P&L impact. Audience Takeaways: The Readiness Scorecard: A 9-category framework to baseline your enterprise’s data streaming maturity. Executive Metrics: Link streaming data to the P&L, including annual cost savings and a reduction in implementation costs. The "Day Zero" Blueprint: A step-by-step execution roadmap to navigate the transition from organic adoption to a unified enterprise backbone.

Jon McCullagh-Vines

Confluent

This is some text inside of a div block.
,
This is some text inside of a div block.
Batch Is Just a Slow Stream: Designing Event-First Pipelines Without Going All-In on Real-Time
Most data platforms still think in batches. Daily jobs, hourly micro-batches, and carefully tuned schedules dominate, even in organizations already running Kafka. The result is familiar: complex backfills, fragile dependencies, hard-to-debug pipelines, and endless debates about whether “real-time” is worth the cost and complexity. This talk argues that the real problem isn’t batch versus streaming, it’s the mindset behind them. Batch and streaming are not fundamentally different architectures. They are the same model operating at different speeds. When teams design pipelines around intervals instead of events, they lock themselves into unnecessary complexity and make future change expensive. By contrast, designing workloads as streams first allows processing speed to become a configuration choice rather than an architectural constraint. In this session, we explore how to transition batch workloads by shifting to stream-first, event-based thinking without committing to always-on, low-latency systems on day one. We’ll show how to model data as a sequence of events, reason about state and correctness over time, and decouple business logic from scheduling. From there, the same pipeline can safely run daily, hourly, or continuously, depending on cost, operational maturity, and business value. We’ll also discuss when event-driven architecture naturally emerges as the simplest solution not because “real-time” is a goal, but because making change explicit removes the need for artificial intervals. Backfills become replays, late data becomes a first-class concern, and debugging shifts from job-centric to time-centric reasoning. Attendees will leave with a practical mental model for evolving batch pipelines using Kafka, guidance on choosing processing speed deliberately, and a clear path toward event-based systems that scales with their organization not against it.

Ramzi Alashabi

ABN AMRO Bank N.V.

This is some text inside of a div block.
,
This is some text inside of a div block.