London Sessions

May 20, 2026

Breakout Session

Building Reliable CDC and Kafka Mirroring Pipelines at Trillion-Message Scale

At large scale, reliability is unforgiving. When a data platform processes trillions of events per day, even small delays or inconsistencies can ripple across analytics, AI systems, and customer-facing products. In these environments, Change Data Capture (CDC) pipelines are no longer just ingestion tools — they become core production infrastructure with strict latency and correctness requirements. In this talk, we’ll share lessons from operating Brooklin, an open-source data streaming platform used at LinkedIn to run reliable CDC and Kafka mirroring pipelines at massive scale. Brooklin processes over 7 trillion messages per day across 50+ clusters, mirrors 100k+ Kafka topics, and supports sub-minute SLAs for critical workloads spanning multiple teams and use cases. Rather than focusing on how to build CDC systems from scratch, this session emphasizes how platform teams can adopt proven patterns to operate CDC and Kafka mirroring reliably in real-world environments. We’ll discuss common CDC use cases across database-heavy organizations, including capturing changes from systems such as MySQL, Oracle, and TiDB, streaming them into Apache Kafka, and mirroring data across Kafka clusters for isolation, multi-region deployments, and organizational boundaries. This session is aimed at intermediate to advanced data engineers and platform teams. Rather than diving into low-level internals or how to build CDC from scratch, we’ll focus on practical design and operational strategies for adopting and operating CDC and Kafka mirroring platforms at scale: partitioning and throughput considerations, handling schema evolution, managing backpressure, and supporting differentiated SLAs—from near-real-time (≈1 minute) to relaxed latency (≈30 minutes)—across shared infrastructure. Brooklin, an open-source data streaming platform, will be presented as a reference implementation that demonstrates how these patterns work in practice. We’ll share how similar approaches can be adopted by other organizations building CDC and Kafka mirroring pipelines across diverse databases and environments. Attendees will leave with concrete insights into designing reliable CDC architectures, understanding real-world failure modes, and applying proven patterns to build production-grade streaming systems.

Harshade Yesane

Building Reliable CDC and Kafka Mirroring Pipelines at Trillion-Message Scale

At large scale, reliability is unforgiving. When a data platform processes trillions of events per day, even small delays or inconsistencies can ripple across analytics, AI systems, and customer-facing products. In these environments, Change Data Capture (CDC) pipelines are no longer just ingestion tools — they become core production infrastructure with strict latency and correctness requirements. In this talk, we’ll share lessons from operating Brooklin, an open-source data streaming platform used at LinkedIn to run reliable CDC and Kafka mirroring pipelines at massive scale. Brooklin processes over 7 trillion messages per day across 50+ clusters, mirrors 100k+ Kafka topics, and supports sub-minute SLAs for critical workloads spanning multiple teams and use cases. Rather than focusing on how to build CDC systems from scratch, this session emphasizes how platform teams can adopt proven patterns to operate CDC and Kafka mirroring reliably in real-world environments. We’ll discuss common CDC use cases across database-heavy organizations, including capturing changes from systems such as MySQL, Oracle, and TiDB, streaming them into Apache Kafka, and mirroring data across Kafka clusters for isolation, multi-region deployments, and organizational boundaries. This session is aimed at intermediate to advanced data engineers and platform teams. Rather than diving into low-level internals or how to build CDC from scratch, we’ll focus on practical design and operational strategies for adopting and operating CDC and Kafka mirroring platforms at scale: partitioning and throughput considerations, handling schema evolution, managing backpressure, and supporting differentiated SLAs—from near-real-time (≈1 minute) to relaxed latency (≈30 minutes)—across shared infrastructure. Brooklin, an open-source data streaming platform, will be presented as a reference implementation that demonstrates how these patterns work in practice. We’ll share how similar approaches can be adopted by other organizations building CDC and Kafka mirroring pipelines across diverse databases and environments. Attendees will leave with concrete insights into designing reliable CDC architectures, understanding real-world failure modes, and applying proven patterns to build production-grade streaming systems.

Harshade Yesane

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

From Batch to Real Time: Operating Cassandra CDC with Debezium at Datadog Scale

At Datadog, the Metrics Query Activity feature relies on fast faceted search across operational data stored in Cassandra. The previous replication model used scheduled batch jobs that queried Cassandra by partition key and copied the data into Elasticsearch. This created heavy read pressure on production clusters, introduced operational complexity, and resulted in a four hour delay before changes became visible downstream. The batch jobs ran on a fixed schedule and were enabled for only a limited subset of customers. With the Cassandra cluster sustaining write volumes exceeding 30,000 writes per second, extending this approach to the full customer base would have required an increase in job execution rate and query volume. This talk presents how we replaced this batch approach with a real time streaming architecture based on Cassandra CDC and the open source Debezium Cassandra connector, including upstream contributions to the project. By capturing commit logs directly and streaming changes into Kafka, we removed the need for read intensive extraction jobs. A downstream Kafka Connect Elasticsearch sink then applies updates as they arrive, keeping indexed documents aligned with the source of truth within seconds. Supporting Datadog’s write volume required ensuring the CDC pipeline could process more than 30,000 writes per second with resilient behavior. We tuned Debezium’s Kafka producers and evaluated the system under peak load, while verifying at least once delivery and clean recovery from connector issues to maintain eventual consistency downstream. The impact is reflected in several key metrics. Replication delay fell from four hours to under ten seconds. Eliminating read heavy extraction jobs removed pressure on Cassandra and created opportunities for future cluster downscaling. The new architecture also reduced operational cost by an estimated 46 percent while providing a streaming model that scales naturally with write throughput and isolates OLTP workloads from downstream processing. Attendees will learn how to implement Cassandra CDC with Debezium in a high volume environment, how to tune and scale Debezium and Kafka to handle demanding write workloads, how to migrate safely from batch replication to streaming, and the practical lessons we learned while operationalizing Cassandra CDC at Datadog scale.

Joan Gomez / Alejandro Huertas

From Batch to Real Time: Operating Cassandra CDC with Debezium at Datadog Scale

At Datadog, the Metrics Query Activity feature relies on fast faceted search across operational data stored in Cassandra. The previous replication model used scheduled batch jobs that queried Cassandra by partition key and copied the data into Elasticsearch. This created heavy read pressure on production clusters, introduced operational complexity, and resulted in a four hour delay before changes became visible downstream. The batch jobs ran on a fixed schedule and were enabled for only a limited subset of customers. With the Cassandra cluster sustaining write volumes exceeding 30,000 writes per second, extending this approach to the full customer base would have required an increase in job execution rate and query volume. This talk presents how we replaced this batch approach with a real time streaming architecture based on Cassandra CDC and the open source Debezium Cassandra connector, including upstream contributions to the project. By capturing commit logs directly and streaming changes into Kafka, we removed the need for read intensive extraction jobs. A downstream Kafka Connect Elasticsearch sink then applies updates as they arrive, keeping indexed documents aligned with the source of truth within seconds. Supporting Datadog’s write volume required ensuring the CDC pipeline could process more than 30,000 writes per second with resilient behavior. We tuned Debezium’s Kafka producers and evaluated the system under peak load, while verifying at least once delivery and clean recovery from connector issues to maintain eventual consistency downstream. The impact is reflected in several key metrics. Replication delay fell from four hours to under ten seconds. Eliminating read heavy extraction jobs removed pressure on Cassandra and created opportunities for future cluster downscaling. The new architecture also reduced operational cost by an estimated 46 percent while providing a streaming model that scales naturally with write throughput and isolates OLTP workloads from downstream processing. Attendees will learn how to implement Cassandra CDC with Debezium in a high volume environment, how to tune and scale Debezium and Kafka to handle demanding write workloads, how to migrate safely from batch replication to streaming, and the practical lessons we learned while operationalizing Cassandra CDC at Datadog scale.

Joan Gomez

Datadog

Alejandro Huertas

Datadog

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Testing Flink SQL Scripts Made Simple for Non-Developers

Developing Flink SQL scripts can be challenging, especially for non-developers like data scientists. Simplifying the development process is essential to ensure correctness and reliability, with testing playing a crucial role. We introduce a Flink SQL Test Runner as a solution to streamline this process. Key Points:1. Test Runner Architecture: Get a look at the inner workings of the SQL Test Runner which supports both unit and integration tests and generates detailed reports. It is designed to be used behind REST APIs and CI/CD Pipelines. 2. Unit Tests: Learn hands-on how to write a unit test using a SQL script and a Java unit test. In the background, Apache Paimon is used to mock sources and sinks. The unit test files are compiled at runtime, enabling quick execution over the network. 3. Integration Tests: Get to know the concepts used behind a SQL integration test using both the user’s SQL script and a testing SQL script - integrating the ideas of DB to test SQL with SQL. The Test Runner supports both negative and positive testing modes to assert the count of the retrieved results of a stream. 4. Deployment: The Test Runner is packaged as a Docker image, accepting file paths via environment variables, and facilitating integration into CI/CD pipelines behind REST APIs. Conclusion:The Flink SQL Test Runner significantly enhances the testing process for Flink SQL scripts, supporting robust unit and integration tests. It simplifies the development process for complex Flink projects, meeting the needs of both developers and non-developers.

Robin Fehr

Testing Flink SQL Scripts Made Simple for Non-Developers

Developing Flink SQL scripts can be challenging, especially for non-developers like data scientists. Simplifying the development process is essential to ensure correctness and reliability, with testing playing a crucial role. We introduce a Flink SQL Test Runner as a solution to streamline this process. Key Points:1. Test Runner Architecture: Get a look at the inner workings of the SQL Test Runner which supports both unit and integration tests and generates detailed reports. It is designed to be used behind REST APIs and CI/CD Pipelines. 2. Unit Tests: Learn hands-on how to write a unit test using a SQL script and a Java unit test. In the background, Apache Paimon is used to mock sources and sinks. The unit test files are compiled at runtime, enabling quick execution over the network. 3. Integration Tests: Get to know the concepts used behind a SQL integration test using both the user’s SQL script and a testing SQL script - integrating the ideas of DB to test SQL with SQL. The Test Runner supports both negative and positive testing modes to assert the count of the retrieved results of a stream. 4. Deployment: The Test Runner is packaged as a Docker image, accepting file paths via environment variables, and facilitating integration into CI/CD pipelines behind REST APIs. Conclusion:The Flink SQL Test Runner significantly enhances the testing process for Flink SQL scripts, supporting robust unit and integration tests. It simplifies the development process for complex Flink projects, meeting the needs of both developers and non-developers.

Robin Fehr

Acosom GmbH

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Deep dive into writing Queues for Kafka applications

Queues for Kafka introduces a new paradigm for consuming data from Apache Kafka, along with the new Share Consumer API. If you’ve ever struggled with building message-queuing applications on top of Kafka, struggle no more. The Share Consumer API in Apache Kafka 4.2 makes this easy. The Share Consumer API still consumes data from Kafka topics, but if you think about how it works, it behaves much more like a message queue. Learn all about how partitions are shared by the consumers, taking precise control over record fetching, the ways of acknowledging record delivery properly, error handling strategies, and even how best to deal with records which take a very long time to process. If you’ve wondered how to read records from Kafka and process them with a group of workers using generative AI without worrying about head-of-line blocking, this is the talk for you.

Andrew Schofield / Apoorv Mittal

Deep dive into writing Queues for Kafka applications

Queues for Kafka introduces a new paradigm for consuming data from Apache Kafka, along with the new Share Consumer API. If you’ve ever struggled with building message-queuing applications on top of Kafka, struggle no more. The Share Consumer API in Apache Kafka 4.2 makes this easy. The Share Consumer API still consumes data from Kafka topics, but if you think about how it works, it behaves much more like a message queue. Learn all about how partitions are shared by the consumers, taking precise control over record fetching, the ways of acknowledging record delivery properly, error handling strategies, and even how best to deal with records which take a very long time to process. If you’ve wondered how to read records from Kafka and process them with a group of workers using generative AI without worrying about head-of-line blocking, this is the talk for you.

Andrew Schofield

Confluent

Apoorv Mittal

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Back to the Boring: GenAI That Ships

Everyone’s chasing the Unicorn: agentic workflows, autonomous everything, “AI-first” transformations, and flashy demos. Meanwhile, the reality reported by MIT is clear: about 95% of GenAI pilots fail to deliver. But the issue isn’t with the models, it’s with the problems we choose to address.This talk focuses on the 5% that succeed. Not by taking on larger, shinier projects, but by targeting the boring work that exists in every enterprise: repeatable workflows, exceptions, approvals, reconciliations, handoffs, and “someone needs to write a report and decide what to do next.” These are the places where small improvements compound quickly, and where GenAI can create real productivity gains now.I’ll show a delivery pattern where GenAI is embedded into a set of small, focused microservices, coordinated through Kafka as the system-of-record for workflow state and decisions. Kafka provides a shared backbone for service-to-service communication, with schemas and contracts enforcing structured data, and with traceability treated as a first-class design goal. That traceability is what turns GenAI from an unpredictable demo into a repeatable system: every step is observable, decisions can be audited, outputs can be replayed, and you avoid the trap of an LLM taking unrepeatable actions that you can’t explain later.No hype. No digital revolution required. Just a grounded playbook for turning GenAI from a pilot factory into a value delivery engine, because there’s a long backlog of boring problems waiting to be solved.You’ll leave this talk with:- Real examples of “boring” GenAI use cases already running in production- A concrete way to structure GenAI systems so they stay understandable, auditable, and under control as they evolve- Practical techniques for making GenAI behaviour traceable and repeatable, instead of opaque and one-off- A stronger instinct for saying “no” to bad AI ideas, and focusing effort where GenAI can deliver value now By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

David Navalho

Back to the Boring: GenAI That Ships

Everyone’s chasing the Unicorn: agentic workflows, autonomous everything, “AI-first” transformations, and flashy demos. Meanwhile, the reality reported by MIT is clear: about 95% of GenAI pilots fail to deliver. But the issue isn’t with the models, it’s with the problems we choose to address.This talk focuses on the 5% that succeed. Not by taking on larger, shinier projects, but by targeting the boring work that exists in every enterprise: repeatable workflows, exceptions, approvals, reconciliations, handoffs, and “someone needs to write a report and decide what to do next.” These are the places where small improvements compound quickly, and where GenAI can create real productivity gains now.I’ll show a delivery pattern where GenAI is embedded into a set of small, focused microservices, coordinated through Kafka as the system-of-record for workflow state and decisions. Kafka provides a shared backbone for service-to-service communication, with schemas and contracts enforcing structured data, and with traceability treated as a first-class design goal. That traceability is what turns GenAI from an unpredictable demo into a repeatable system: every step is observable, decisions can be audited, outputs can be replayed, and you avoid the trap of an LLM taking unrepeatable actions that you can’t explain later.No hype. No digital revolution required. Just a grounded playbook for turning GenAI from a pilot factory into a value delivery engine, because there’s a long backlog of boring problems waiting to be solved.You’ll leave this talk with:- Real examples of “boring” GenAI use cases already running in production- A concrete way to structure GenAI systems so they stay understandable, auditable, and under control as they evolve- Practical techniques for making GenAI behaviour traceable and repeatable, instead of opaque and one-off- A stronger instinct for saying “no” to bad AI ideas, and focusing effort where GenAI can deliver value now By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

David Navalho

Marionete

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 0206

Breakout Session

Towards Interoperable Intelligence: Streaming Foundations for Multi‑Agent Systems

As enterprises begin deploying AI agents at scale, agent sprawl is inevitable. Without governance infrastructure, unmanaged autonomy cascades: agents cannot reliably discover existing tools, so teams unknowingly rebuild functionality; data quality drifts between systems; conflicting decisions emerge; lineage goes missing; and hallucinated actions execute at machine speed, before any human can audit or stop them. A governance nightmare. In this talk, we show how a self‑service event streaming platform provides the missing foundation to govern multi‑agent systems. Event stream processing delivers real‑time aggregation, explicit data contracts, and end‑to‑end observability - so agents can reason in a clean, auditable state instead of ad‑hoc APIs and shadow IT. Our position: taming this sprawl does not require a new AI stack. It requires treating your existing event streaming platform as governance infrastructure. At E.ON / Essent, we're building a self‑service streaming platform on top of Confluent Cloud for enterprise AI operations. Key capabilities – EventCatalog for discovery, Flink for real‑time aggregation, and validation gates – are proving essential for agent governance. The hardest part, however, is not platform engineering; it is helping teams understand when and how to use it, and to think in streams, not just events. Our strategic framework - the Agentic AI Interoperability Target Picture - addresses the autonomous systems challenge: the identity and policy layer enforces safe execution boundaries, the control plane coordinates agent activity, and the agent gateway validates all requests. This enables bounded autonomy at scale. The streaming platform forms that foundation, enabling agents to reason on curated, validated, aggregated state instead of stale snapshots. Three concrete capabilities operationalize this: Discovery & Registry: Searchable data contracts enable agents to know what exists Real‑Time Aggregation: Flink materialized views provide timely state Upstream Validation: Quality gates enforce schemas so agents act only on trusted data By combining self‑service access with centralized governance, we've found a path from agent sprawl toward orchestrated autonomy at scale. Attendees will not only learn our platform patterns and governance approach, but discover the real challenge: organizational transformation. E.ON's SAP case - 500+ topics, real-time aggregation replacing blind batches - proves shifting teams to stream thinking is what unlocks AI autonomy at scale.

Patrick Berger / Martijn van der Pauw

Towards Interoperable Intelligence: Streaming Foundations for Multi‑Agent Systems

As enterprises begin deploying AI agents at scale, agent sprawl is inevitable. Without governance infrastructure, unmanaged autonomy cascades: agents cannot reliably discover existing tools, so teams unknowingly rebuild functionality; data quality drifts between systems; conflicting decisions emerge; lineage goes missing; and hallucinated actions execute at machine speed, before any human can audit or stop them. A governance nightmare. In this talk, we show how a self‑service event streaming platform provides the missing foundation to govern multi‑agent systems. Event stream processing delivers real‑time aggregation, explicit data contracts, and end‑to‑end observability - so agents can reason in a clean, auditable state instead of ad‑hoc APIs and shadow IT. Our position: taming this sprawl does not require a new AI stack. It requires treating your existing event streaming platform as governance infrastructure. At E.ON / Essent, we're building a self‑service streaming platform on top of Confluent Cloud for enterprise AI operations. Key capabilities – EventCatalog for discovery, Flink for real‑time aggregation, and validation gates – are proving essential for agent governance. The hardest part, however, is not platform engineering; it is helping teams understand when and how to use it, and to think in streams, not just events. Our strategic framework - the Agentic AI Interoperability Target Picture - addresses the autonomous systems challenge: the identity and policy layer enforces safe execution boundaries, the control plane coordinates agent activity, and the agent gateway validates all requests. This enables bounded autonomy at scale. The streaming platform forms that foundation, enabling agents to reason on curated, validated, aggregated state instead of stale snapshots. Three concrete capabilities operationalize this: Discovery & Registry: Searchable data contracts enable agents to know what exists Real‑Time Aggregation: Flink materialized views provide timely state Upstream Validation: Quality gates enforce schemas so agents act only on trusted data By combining self‑service access with centralized governance, we've found a path from agent sprawl toward orchestrated autonomy at scale. Attendees will not only learn our platform patterns and governance approach, but discover the real challenge: organizational transformation. E.ON's SAP case - 500+ topics, real-time aggregation replacing blind batches - proves shifting teams to stream thinking is what unlocks AI autonomy at scale.

Patrick Berger

E.ON Digital Technology

Martijn van der Pauw

Essent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

🤖 Building AI systems? Context - and Flink - is all you need!

Traditional batch architectures cannot meet the needs of modern AI systems, which increasingly operate as autonomous agents requiring millisecond-latency access to both data and its metadata context. Batch ETL introduces unavoidable staleness, relies on fragile orchestration for backfills, and pushes governance and lineage downstream into the analytical estate — too late for AI systems that must make real-time operational decisions. This creates accuracy issues, model drift, and regulatory blind spots. This talk explains why organizations need to adapt streaming-native Real-time Context Engines built on continuous data processing, incremental enrichment, and first-class data governance. Using engines like Apache Flink, enriched event streams are governed, and lineage-tracked as part of the streaming pipeline itself, shifting left toward the point of data generation. We detail how event-time semantics, schema evolution, and incremental state updates enable deterministic behavior and full reproducibility without manual pipeline rewrites. A core capability is materialized context: every enriched and governed dataset is projected into a strongly consistent, queryable in-memory table, continuously updated from the event log. Both the data and its metadata context are available to AI agents through open interfaces like MCP (Model Context Protocol) or REST API endpoints. This enables agents not only to retrieve the freshest state, but also to inspect the provenance, quality constraints, and governance rules associated with that state which is critical for reliability, trust, and regulatory compliance. Equally important is the role of memory management, indexing, and schema intelligence in serving the optimal context to stateless AI models. Because AI systems have no internal memory, every prompt requires reconstructing the most relevant slice of context: per case, customer, conversation, transaction, or semantic topic. This demand necessitates granular in-memory indexing, adaptive caching, and strong ontology awareness to locate and deliver only the minimal but most meaningful context at low latency. Organizations must therefore deeply understand their data ontologies, entity relationships, and schema evolution patterns to design memory-efficient, fine-grained indexes that ensure AI agents always operate with precise, fully updated context rather than broad, stale datasets. We will show architectural blueprints and operational patterns for building scalable, low-latency, governance-first context layers suitable for high-stakes AI-driven operations. In that context, we will highlight the regulatory implications: when AI systems make recommendations or act automatically, organizations must document the context and lineage of the data that influenced the decision. Real-time lineage tracking ensures auditability, verifiable traceability, and accountability.

Steffan Hoellinger

🤖 Building AI systems? Context - and Flink - is all you need!

Traditional batch architectures cannot meet the needs of modern AI systems, which increasingly operate as autonomous agents requiring millisecond-latency access to both data and its metadata context. Batch ETL introduces unavoidable staleness, relies on fragile orchestration for backfills, and pushes governance and lineage downstream into the analytical estate — too late for AI systems that must make real-time operational decisions. This creates accuracy issues, model drift, and regulatory blind spots. This talk explains why organizations need to adapt streaming-native Real-time Context Engines built on continuous data processing, incremental enrichment, and first-class data governance. Using engines like Apache Flink, enriched event streams are governed, and lineage-tracked as part of the streaming pipeline itself, shifting left toward the point of data generation. We detail how event-time semantics, schema evolution, and incremental state updates enable deterministic behavior and full reproducibility without manual pipeline rewrites. A core capability is materialized context: every enriched and governed dataset is projected into a strongly consistent, queryable in-memory table, continuously updated from the event log. Both the data and its metadata context are available to AI agents through open interfaces like MCP (Model Context Protocol) or REST API endpoints. This enables agents not only to retrieve the freshest state, but also to inspect the provenance, quality constraints, and governance rules associated with that state which is critical for reliability, trust, and regulatory compliance. Equally important is the role of memory management, indexing, and schema intelligence in serving the optimal context to stateless AI models. Because AI systems have no internal memory, every prompt requires reconstructing the most relevant slice of context: per case, customer, conversation, transaction, or semantic topic. This demand necessitates granular in-memory indexing, adaptive caching, and strong ontology awareness to locate and deliver only the minimal but most meaningful context at low latency. Organizations must therefore deeply understand their data ontologies, entity relationships, and schema evolution patterns to design memory-efficient, fine-grained indexes that ensure AI agents always operate with precise, fully updated context rather than broad, stale datasets. We will show architectural blueprints and operational patterns for building scalable, low-latency, governance-first context layers suitable for high-stakes AI-driven operations. In that context, we will highlight the regulatory implications: when AI systems make recommendations or act automatically, organizations must document the context and lineage of the data that influenced the decision. Real-time lineage tracking ensures auditability, verifiable traceability, and accountability.

Steffan Hoellinger

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

From Data Pipelines to Context Streams: Building Infrastructure for the Agent Era

For decades, data engineers have built infrastructure optimized for human consumers—analysts running queries, scientists training models, executives viewing dashboards. Batch processing, overnight refreshes, and query-friendly schemas made perfect sense. But in 2026, a new primary consumer is emerging: AI agents. And they have radically different requirements. Agents don't wait for nightly batch jobs. They need fresh context delivered in milliseconds. They don't browse dashboards—they consume structured context windows that must be assembled on-demand from multiple streaming sources. They don't tolerate stale data gracefully; outdated context leads to hallucinations, incorrect actions, and compounding failures across multi-agent workflows. This talk introduces "context engineering" as the discipline of building data infrastructure for agent-facing applications. We'll explore how streaming platforms like Apache Kafka become the foundational layer for real-time context delivery, and why the patterns that served human analytics fall apart when agents are your consumers. We'll cover three core challenges through production examples: First, context assembly—how to join, filter, and enrich multiple event streams into coherent context windows with sub-100ms latency using Kafka Streams and Flink. Second, state management for agents—leveraging event sourcing patterns so agents can access not just current state but temporal context ("what did this customer do in the last hour?"). Third, observability for agent-consumed data—why traditional data quality metrics miss the failure modes that matter for agents, and how to build context delivery SLOs. Throughout, we'll examine real architecture decisions: when to push context to agents versus let them pull, how to handle context window limits as a backpressure signal, and patterns for graceful degradation when upstream data sources lag. The underlying principles of good data engineering—reliability, freshness, correctness—remain constant. But the application layer is transforming. Data teams that recognize agents as first-class consumers, not afterthoughts, will build the infrastructure that powers the next generation of AI applications. Attendees will leave with concrete architectural patterns for agent-facing data infrastructure, an understanding of how streaming primitives map to agent context requirements, and a framework for evaluating whether their current data platforms are ready for AI-native workloads. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

David Kjerrumgaard

From Data Pipelines to Context Streams: Building Infrastructure for the Agent Era

For decades, data engineers have built infrastructure optimized for human consumers—analysts running queries, scientists training models, executives viewing dashboards. Batch processing, overnight refreshes, and query-friendly schemas made perfect sense. But in 2026, a new primary consumer is emerging: AI agents. And they have radically different requirements. Agents don't wait for nightly batch jobs. They need fresh context delivered in milliseconds. They don't browse dashboards—they consume structured context windows that must be assembled on-demand from multiple streaming sources. They don't tolerate stale data gracefully; outdated context leads to hallucinations, incorrect actions, and compounding failures across multi-agent workflows. This talk introduces "context engineering" as the discipline of building data infrastructure for agent-facing applications. We'll explore how streaming platforms like Apache Kafka become the foundational layer for real-time context delivery, and why the patterns that served human analytics fall apart when agents are your consumers. We'll cover three core challenges through production examples: First, context assembly—how to join, filter, and enrich multiple event streams into coherent context windows with sub-100ms latency using Kafka Streams and Flink. Second, state management for agents—leveraging event sourcing patterns so agents can access not just current state but temporal context ("what did this customer do in the last hour?"). Third, observability for agent-consumed data—why traditional data quality metrics miss the failure modes that matter for agents, and how to build context delivery SLOs. Throughout, we'll examine real architecture decisions: when to push context to agents versus let them pull, how to handle context window limits as a backpressure signal, and patterns for graceful degradation when upstream data sources lag. The underlying principles of good data engineering—reliability, freshness, correctness—remain constant. But the application layer is transforming. Data teams that recognize agents as first-class consumers, not afterthoughts, will build the infrastructure that powers the next generation of AI applications. Attendees will leave with concrete architectural patterns for agent-facing data infrastructure, an understanding of how streaming primitives map to agent context requirements, and a framework for evaluating whether their current data platforms are ready for AI-native workloads. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

David Kjerrumgaard

StreamNative

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Building modern, event-driven applications with traditional Java and Apache Kafka often means grappling with slow development cycles and complex local environment setup. This friction hinders developer productivity. A new generation of Java platforms like Quarkus is changing this. Designed for the cloud, these runtimes feature extremely fast startup times and a minimal memory footprint. We'll demonstrate how these platforms drastically simplify the developer experience—such as by automatically provisioning a live Kafka cluster for local development. Beyond the inner-loop, we will show how these optimized applications, combined with cloud-native tooling, enable you to deploy and rapidly scale your Java microservices and Kafka components on platforms like Kubernetes. This ensures you can efficiently handle fluctuating event loads, scaling seamlessly from zero to massive scale. Attendees should come away with: Learn how modern Java simplifies the entire Kafka development lifecycle, from local setup to production.Practical knowledge for building high-performance, event-driven applications designed for rapid scaling and minimal resource consumption.Inspiration for leveraging these technologies to achieve faster time-to-market and operational efficiency.

Viktor Gamov / Kevin Dubois

Supersonic Streams: When Quarkus Met Kafka

Building modern, event-driven applications with traditional Java and Apache Kafka often means grappling with slow development cycles and complex local environment setup. This friction hinders developer productivity. A new generation of Java platforms like Quarkus is changing this. Designed for the cloud, these runtimes feature extremely fast startup times and a minimal memory footprint. We'll demonstrate how these platforms drastically simplify the developer experience—such as by automatically provisioning a live Kafka cluster for local development. Beyond the inner-loop, we will show how these optimized applications, combined with cloud-native tooling, enable you to deploy and rapidly scale your Java microservices and Kafka components on platforms like Kubernetes. This ensures you can efficiently handle fluctuating event loads, scaling seamlessly from zero to massive scale. Attendees should come away with: Learn how modern Java simplifies the entire Kafka development lifecycle, from local setup to production.Practical knowledge for building high-performance, event-driven applications designed for rapid scaling and minimal resource consumption.Inspiration for leveraging these technologies to achieve faster time-to-market and operational efficiency.

Viktor Gamov

Confluent

Kevin Dubois

IBM

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

How to write your own partition assignor in Kafka’s KIP-848 Era

Apache Kafka 4.0 makes the new consumer group rebalance protocol (KIP‑848) generally available, shifting assignment computation from clients to the broker-side coordinator and eliminating the classic leader-driven, stop-the-world rebalances. This change simplifies consumers and improves stability and time-to-recover during membership or metadata churn, but it also removes support for client-side partition assignors on standard consumers in favor of pluggable server-side assignors. For teams that previously relied on custom assignment strategies, the good news is that you can still customize behavior by implementing a broker-side assignor via the new server assignor SPI and enabling it based on group-level and client configurations. This talk demystifies how assignment works in the KIP‑848 era and provides a practical, end-to-end guide to writing, testing, and safely rolling out a custom broker-side assignor. We’ll cover the coordinator’s target-assignment model and incremental reconciliation, as well as the constraints and contracts your assignor must respect (stickiness, determinism, payload limits). We’ll share a working example: real code and configs to write and use our very own custom assignor. We’ll also share migration tips for teams moving off classic client-side strategies, how to observe and debug assignments with the new metrics, and guardrails for compatibility and rollback. You’ll leave with a clear checklist and reference implementation path to bring your own assignor to production without sacrificing the resilience and operability benefits of KIP‑848. Who is it for Platform and application engineers running Kafka at scale Developers who need custom partition strategies Engineers upgrading client applications to Kafka 4.x, migrating from from classic client-side assignors to broker-side assignors under KIP‑848

Lianet Margans / David Jacot

How to write your own partition assignor in Kafka’s KIP-848 Era

Apache Kafka 4.0 makes the new consumer group rebalance protocol (KIP‑848) generally available, shifting assignment computation from clients to the broker-side coordinator and eliminating the classic leader-driven, stop-the-world rebalances. This change simplifies consumers and improves stability and time-to-recover during membership or metadata churn, but it also removes support for client-side partition assignors on standard consumers in favor of pluggable server-side assignors. For teams that previously relied on custom assignment strategies, the good news is that you can still customize behavior by implementing a broker-side assignor via the new server assignor SPI and enabling it based on group-level and client configurations. This talk demystifies how assignment works in the KIP‑848 era and provides a practical, end-to-end guide to writing, testing, and safely rolling out a custom broker-side assignor. We’ll cover the coordinator’s target-assignment model and incremental reconciliation, as well as the constraints and contracts your assignor must respect (stickiness, determinism, payload limits). We’ll share a working example: real code and configs to write and use our very own custom assignor. We’ll also share migration tips for teams moving off classic client-side strategies, how to observe and debug assignments with the new metrics, and guardrails for compatibility and rollback. You’ll leave with a clear checklist and reference implementation path to bring your own assignor to production without sacrificing the resilience and operability benefits of KIP‑848. Who is it for Platform and application engineers running Kafka at scale Developers who need custom partition strategies Engineers upgrading client applications to Kafka 4.x, migrating from from classic client-side assignors to broker-side assignors under KIP‑848

Lianet Margans

Confluent

David Jacot

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Life as a Kafka Admin: Lessons from Running 30+ Clusters in Production

Operating Apache Kafka in production is very different from just “using Kafka as a developer”. Since 2021, I’ve worked as a Kafka Admin responsible for more than 50 clusters across multiple regions, helping dozens of teams build on top of Kafka while keeping the platform stable and predictable. Over time, the patterns repeat: too many or too few partitions, services calling slow external APIs in the middle of stream processing, painful rebalances, clients that “cannot connect”, and users who just want the platform to “work” without learning all the internals. This talk shares the practical lessons learned from living in that world every day. It covers how to design and review topics and partitioning, how to deal with rebalances and skew, how to debug connection and authentication issues at scale, and how to build automations and guardrails that improve the developer experience for many teams at once. It also highlights what changes when you manage many clusters in different environments and regions, and how to keep your sanity while doing it.

Marcos Prado

Life as a Kafka Admin: Lessons from Running 30+ Clusters in Production

Operating Apache Kafka in production is very different from just “using Kafka as a developer”. Since 2021, I’ve worked as a Kafka Admin responsible for more than 50 clusters across multiple regions, helping dozens of teams build on top of Kafka while keeping the platform stable and predictable. Over time, the patterns repeat: too many or too few partitions, services calling slow external APIs in the middle of stream processing, painful rebalances, clients that “cannot connect”, and users who just want the platform to “work” without learning all the internals. This talk shares the practical lessons learned from living in that world every day. It covers how to design and review topics and partitioning, how to deal with rebalances and skew, how to debug connection and authentication issues at scale, and how to build automations and guardrails that improve the developer experience for many teams at once. It also highlights what changes when you manage many clusters in different environments and regions, and how to keep your sanity while doing it.

Marcos Prado

SREENGINEER

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Design with me : a Kafka Streams Payment Authorization Collaborative Design Session

Most Kafka Streams tutorials end where production begins. This interactive, detailed-design, whiteboarding session bridges that gap by designing a real-world payment authorization system: collaboratively, honestly, and without hiding the messy parts. We will start optimistically with Kafka Streams DSL approaches: elegant joins, straightforward windowing: and systematically discover their breaking points. As Apache Kafka Streams DSL solutions fail under real-world constraints, we'll architect production-grade alternatives using the Processor API, custom state store designs, and advanced patterns for handling late data, implementing timeouts, managing state lifecycle, and ensuring exactly-once semantics. We'll visually map out state management strategies, “watermarking” approaches, and the trade-offs between different join patterns when dealing with temporal uncertainty. Here’s the challenge (Inspired from one of my past engagements) ! Two asynchronous input streams: shiny new payment authorization requests arriving in real-timeaccount state updates that arrive delayed, because that is how the legacy integration actually works. The business requirement is straightforward: decide whether to authorize each payment. The technical reality? Anything but simple. This is not a lecture. Armed with a (virtual) whiteboard, we'll design the system architecture together (i’ll fill in the blanks !) progressively integrating new constraints that turn textbook examples into production systems: late-arriving state, authorization timeouts, authorized user override, state store growth. Attendees will leave with mental models for recognizing when business requirements exceed DSL capabilities, practical patterns for designing custom state stores that match their domain logic, and confidence in choosing between DSL and Processor API approaches. Most importantly: the architectural judgment I've earned through production failures—so you can succeed without repeating them.

Adam Souquieres

Design with me : a Kafka Streams Payment Authorization Collaborative Design Session

Most Kafka Streams tutorials end where production begins. This interactive, detailed-design, whiteboarding session bridges that gap by designing a real-world payment authorization system: collaboratively, honestly, and without hiding the messy parts. We will start optimistically with Kafka Streams DSL approaches: elegant joins, straightforward windowing: and systematically discover their breaking points. As Apache Kafka Streams DSL solutions fail under real-world constraints, we'll architect production-grade alternatives using the Processor API, custom state store designs, and advanced patterns for handling late data, implementing timeouts, managing state lifecycle, and ensuring exactly-once semantics. We'll visually map out state management strategies, “watermarking” approaches, and the trade-offs between different join patterns when dealing with temporal uncertainty. Here’s the challenge (Inspired from one of my past engagements) ! Two asynchronous input streams: shiny new payment authorization requests arriving in real-timeaccount state updates that arrive delayed, because that is how the legacy integration actually works. The business requirement is straightforward: decide whether to authorize each payment. The technical reality? Anything but simple. This is not a lecture. Armed with a (virtual) whiteboard, we'll design the system architecture together (i’ll fill in the blanks !) progressively integrating new constraints that turn textbook examples into production systems: late-arriving state, authorization timeouts, authorized user override, state store growth. Attendees will leave with mental models for recognizing when business requirements exceed DSL capabilities, practical patterns for designing custom state stores that match their domain logic, and confidence in choosing between DSL and Processor API approaches. Most importantly: the architectural judgment I've earned through production failures—so you can succeed without repeating them.

Adam Souquieres

StreamConsulting

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Schema Management in Kafka (with GitOps!)

Schema Management in Kafka is often perceived as introducing significant overhead. Even though using Schemas provides transparency and predictability to your Topics, some teams don't use it because of this perceived complexity. It doesn't have to be complex. I've introduced a similar approach in 2 projects and engineers grasp it really quickly once you have proper setup in place. Once you've seen it once you just get it. I want to share this with you. In this session I'll cover important problems and choices you'll need to understand when using Schemas in Kafka:- Why use Schemas and when it isn't a good idea- How does Schema Evolution work and how to choose Compatibility Type- How and when to publish multiple Event types to the same topic- How to embrace GitOps in Schema Management- What tooling is available- What potential issues you might encounter

Jan Siekierski

Schema Management in Kafka (with GitOps!)

Schema Management in Kafka is often perceived as introducing significant overhead. Even though using Schemas provides transparency and predictability to your Topics, some teams don't use it because of this perceived complexity. It doesn't have to be complex. I've introduced a similar approach in 2 projects and engineers grasp it really quickly once you have proper setup in place. Once you've seen it once you just get it. I want to share this with you. In this session I'll cover important problems and choices you'll need to understand when using Schemas in Kafka:- Why use Schemas and when it isn't a good idea- How does Schema Evolution work and how to choose Compatibility Type- How and when to publish multiple Event types to the same topic- How to embrace GitOps in Schema Management- What tooling is available- What potential issues you might encounter

Jan Siekierski

Kentra

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Your Model Is Fine. Your Context Is Broken.

AI systems don’t fail because models are weak. They fail because context is wrong. As agentic AI moves from experimentation to production, teams are discovering a new class of data problems. Context is scattered across operational databases, event streams, APIs, and vector stores. It’s stale by the time it reaches the model, inconsistent across tools, and expensive to recompute. Most architectures were never designed to continuously assemble and serve context at runtime. This talk introduces context engineering as a practical, systems-level discipline focused on solving these data challenges. Rather than treating context as static input, context engineering treats it as a continuously computed product, derived from live business signals, enriched in real time, governed, and served with low latency to AI systems. We’ll focus on the role of streaming and event-driven architectures as the foundation for this approach. You’ll see why batch pipelines and warehouse-centric designs struggle with agent workloads, and how stream processing enables data enrichment and reprocessing of context as data evolves. In the second half, we’ll build this live. Using Kafka and Flink, we’ll construct a real-time context pipeline that ingests multiple data sources, enriches and materializes them into low-latency tables, and exposes them to AI agents through MCP. This session is for engineers who want to move AI systems out of POCs and into production by fixing the data foundation first.

Sean Falconer

Your Model Is Fine. Your Context Is Broken.

AI systems don’t fail because models are weak. They fail because context is wrong. As agentic AI moves from experimentation to production, teams are discovering a new class of data problems. Context is scattered across operational databases, event streams, APIs, and vector stores. It’s stale by the time it reaches the model, inconsistent across tools, and expensive to recompute. Most architectures were never designed to continuously assemble and serve context at runtime. This talk introduces context engineering as a practical, systems-level discipline focused on solving these data challenges. Rather than treating context as static input, context engineering treats it as a continuously computed product, derived from live business signals, enriched in real time, governed, and served with low latency to AI systems. We’ll focus on the role of streaming and event-driven architectures as the foundation for this approach. You’ll see why batch pipelines and warehouse-centric designs struggle with agent workloads, and how stream processing enables data enrichment and reprocessing of context as data evolves. In the second half, we’ll build this live. Using Kafka and Flink, we’ll construct a real-time context pipeline that ingests multiple data sources, enriches and materializes them into low-latency tables, and exposes them to AI agents through MCP. This session is for engineers who want to move AI systems out of POCs and into production by fixing the data foundation first.

Sean Falconer

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Diskless but with disks, Leaderless but with leaders: A KIP-1163 Deep Dive

KIP-1150: Diskless Topics promises to make Apache Kafka more cost effective and flexible than ever before, but how does it work? Where does the cost savings come from? Is it really Diskless? What about Leaderless? Why is the latency worse? This talk will walk through the design for the preferred implementation in KIP-1163: Diskless Core, and answer all of these questions. A basic understanding of Apache Kafka is enough to attend this talk: we’ll review the architecture used for classic and tiered topics, and how data is produced and fetched. We'll discuss the limitations of this architecture in the context of modern hyperscaler cloud deployments, and where the costs become excessive. Then we’ll show how the basic components of Kafka are taken apart and reassembled to build the Diskless architecture. We’ll also discuss the major rejected alternatives, and compare KIP-1163 to similar KIPs working to solve the same problem. At the end of this session, you should feel confident talking to stakeholders and community members about this amazing upcoming feature! By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Greg Harris

Diskless but with disks, Leaderless but with leaders: A KIP-1163 Deep Dive

KIP-1150: Diskless Topics promises to make Apache Kafka more cost effective and flexible than ever before, but how does it work? Where does the cost savings come from? Is it really Diskless? What about Leaderless? Why is the latency worse? This talk will walk through the design for the preferred implementation in KIP-1163: Diskless Core, and answer all of these questions. A basic understanding of Apache Kafka is enough to attend this talk: we’ll review the architecture used for classic and tiered topics, and how data is produced and fetched. We'll discuss the limitations of this architecture in the context of modern hyperscaler cloud deployments, and where the costs become excessive. Then we’ll show how the basic components of Kafka are taken apart and reassembled to build the Diskless architecture. We’ll also discuss the major rejected alternatives, and compare KIP-1163 to similar KIPs working to solve the same problem. At the end of this session, you should feel confident talking to stakeholders and community members about this amazing upcoming feature! By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Greg Harris

Aiven

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Real-Time Feature Engineering at Scale: Chaining Features and Inference with Chronon

Modern machine learning applications demand features computed in near real-time while maintaining low-latency serving — a challenge that becomes exponentially harder at scale. This talk explores Chronon, an open-source feature platform battle-tested in production at Stripe, Airbnb, Netflix, and OpenAI, and how it bridges the gap between streaming data infrastructure and production ML systems. Traditional feature engineering pipelines force teams to choose between freshness and latency, leading to complex dual pipeline architectures that are expensive to maintain and prone to training-serving skew. Chronon solves this by providing a unified abstraction over batch and streaming computation, enabling teams to define features once and serve them with sub-100ms latencies while keeping them updated in near real-time. We'll demonstrate how Chronon can be used in a wide variety of ML applications such as real-time fraud prevention as well as more complex use-cases that require chaining feature computation with model inference / embedding pipelines such as two-tower search recommendation systems. Additionally, we'll explore how Chronon minimizes computation in the serving hot-path for these use-cases, reducing infrastructure costs by orders of magnitude compared to naive streaming implementations. Audience Takeaways: How Chronon unifies batch and streaming feature computation Chronon's pluggable architecture with respect to table formats, streaming buses, KV stores and model platforms Chronon's approach to minimize serving latency while maximizing feature freshness in production ML systems How one can build ML pipelines that chain feature computation with model inference / embedding for applications such as two-tower recommender systems Real-world lessons from companies serving billions of predictions daily This talk sits at the intersection of data streaming and AI in production, making it ideal for ML engineers, data platform teams, and anyone building real-time intelligent applications.

Piyush Narang

Real-Time Feature Engineering at Scale: Chaining Features and Inference with Chronon

Modern machine learning applications demand features computed in near real-time while maintaining low-latency serving — a challenge that becomes exponentially harder at scale. This talk explores Chronon, an open-source feature platform battle-tested in production at Stripe, Airbnb, Netflix, and OpenAI, and how it bridges the gap between streaming data infrastructure and production ML systems. Traditional feature engineering pipelines force teams to choose between freshness and latency, leading to complex dual pipeline architectures that are expensive to maintain and prone to training-serving skew. Chronon solves this by providing a unified abstraction over batch and streaming computation, enabling teams to define features once and serve them with sub-100ms latencies while keeping them updated in near real-time. We'll demonstrate how Chronon can be used in a wide variety of ML applications such as real-time fraud prevention as well as more complex use-cases that require chaining feature computation with model inference / embedding pipelines such as two-tower search recommendation systems. Additionally, we'll explore how Chronon minimizes computation in the serving hot-path for these use-cases, reducing infrastructure costs by orders of magnitude compared to naive streaming implementations. Audience Takeaways: How Chronon unifies batch and streaming feature computation Chronon's pluggable architecture with respect to table formats, streaming buses, KV stores and model platforms Chronon's approach to minimize serving latency while maximizing feature freshness in production ML systems How one can build ML pipelines that chain feature computation with model inference / embedding for applications such as two-tower recommender systems Real-world lessons from companies serving billions of predictions daily This talk sits at the intersection of data streaming and AI in production, making it ideal for ML engineers, data platform teams, and anyone building real-time intelligent applications.

Piyush Narang

Zipline AI

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Lambda Architecture in 2025: Kafka, Views, and the Evolving Data Platform

Lambda architecture is not dead. At Fresha, we serve ~1M daily bookings through a streaming platform that has evolved for over two years, and we are just getting started. This talk shares our journey of building a cost-effective, production-ready data platform on Kafka, Snowflake, and now Iceberg and StarRocks. Pillar 1: Ingestion - Simple but SolidFrom PostgreSQL to Debezium to Kafka to Snowpipe. Data lands in Snowflake in under 2 seconds. This layer has remained untouched since day one, and that stability enabled everything else. Pillar 2: Consolidation - Cost EffectiveHere is where Lambda architecture shines. We materialize tables every 20 minutes, then merge live CDC events at query time through views. This provides deduplication, schema evolution handling, and near-real-time freshness without running expensive compute 24/7. The pattern is old. It works. Pillar 3: Consumption - The Clever BitHere is what we are proud of: we use Snowflake as an API to support production load, which Snowflake is not designed for. Through smart architecture (connection pooling, query optimisation, view-based routing), we achieve Enterprise-tier capabilities on a non-Enterprise Snowflake plan. When we needed more, we extended with StarRocks and Iceberg - not replacing Snowflake, but complementing it. What you will learn:- Implementing query-time deduplication in Snowflake with dbt and views- Lambda architecture patterns that handle schema evolution gracefully- How to push Snowflake beyond its intended use case without breaking the bank- Extending your platform with Iceberg and StarRocks while keeping Snowflake in the mix The takeaway: You do not need the most expensive tier to build a production-grade streaming platform. Smart architecture beats premium licensing. 2+ years in production. Real patterns. Real cost savings.

Emiliano Mancuso

Lambda Architecture in 2025: Kafka, Views, and the Evolving Data Platform

Lambda architecture is not dead. At Fresha, we serve ~1M daily bookings through a streaming platform that has evolved for over two years, and we are just getting started. This talk shares our journey of building a cost-effective, production-ready data platform on Kafka, Snowflake, and now Iceberg and StarRocks. Pillar 1: Ingestion - Simple but SolidFrom PostgreSQL to Debezium to Kafka to Snowpipe. Data lands in Snowflake in under 2 seconds. This layer has remained untouched since day one, and that stability enabled everything else. Pillar 2: Consolidation - Cost EffectiveHere is where Lambda architecture shines. We materialize tables every 20 minutes, then merge live CDC events at query time through views. This provides deduplication, schema evolution handling, and near-real-time freshness without running expensive compute 24/7. The pattern is old. It works. Pillar 3: Consumption - The Clever BitHere is what we are proud of: we use Snowflake as an API to support production load, which Snowflake is not designed for. Through smart architecture (connection pooling, query optimisation, view-based routing), we achieve Enterprise-tier capabilities on a non-Enterprise Snowflake plan. When we needed more, we extended with StarRocks and Iceberg - not replacing Snowflake, but complementing it. What you will learn:- Implementing query-time deduplication in Snowflake with dbt and views- Lambda architecture patterns that handle schema evolution gracefully- How to push Snowflake beyond its intended use case without breaking the bank- Extending your platform with Iceberg and StarRocks while keeping Snowflake in the mix The takeaway: You do not need the most expensive tier to build a production-grade streaming platform. Smart architecture beats premium licensing. 2+ years in production. Real patterns. Real cost savings.

Emiliano Mancuso

Fresha

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

From Weeks to Seconds: Real-Time ML Quality Control for Medical Device Manufacturing

Medical device manufacturers face a critical challenge: how to scale production 5x while decoupling quality control costs from volume growth. Traditional sampling-based quality control - taking samples every four hours, with results arriving days to weeks later - cannot support this ambition. This talk shares our journey building a real-time ML quality control system that analyses every injection moulding shot in under one second, predicting part dimensions within 10μm accuracy. Core Theme: This session demonstrates how streaming data platforms transform traditional manufacturing quality control from reactive sampling to proactive, real-time decision-making. You'll see how we built a production-grade system that processes sensor data from injection moulding machines across global manufacturing sites, enabling immediate quality insights without touching the machines themselves. Technical Implementation: We built a hybrid on-premise and cloud architecture handling real-time sensor data streams. The system captures sensor data directly from the machines, sends them to the on-premise Kafka deployments, from where the ML models (deployed using Apache Flink) deliver predictions to shop floor operators in under one second end-to-end. I'll share our architectural decisions, the challenges of maintaining sub-second latency at scale, and how we validated ML model accuracy against precision measurement equipment. The Journey & Key Learnings: Rather than presenting a polished success story, I'll walk through our iterative hypothesis-testing approach—what worked, what failed, and why. I'll discuss the human factors: building trust through transparency, involving shop floor workers in the design process, and navigating medical device manufacturing regulations. Audience Takeaways: Attendees will learn practical patterns for implementing real-time ML in industrial environments, strategies for iterative validation of streaming system assumptions, and how to bridge the gap between data science prototypes and production-grade systems that non-technical users trust and adopt.

Samuel von Baußnern

From Weeks to Seconds: Real-Time ML Quality Control for Medical Device Manufacturing

Medical device manufacturers face a critical challenge: how to scale production 5x while decoupling quality control costs from volume growth. Traditional sampling-based quality control - taking samples every four hours, with results arriving days to weeks later - cannot support this ambition. This talk shares our journey building a real-time ML quality control system that analyses every injection moulding shot in under one second, predicting part dimensions within 10μm accuracy. Core Theme: This session demonstrates how streaming data platforms transform traditional manufacturing quality control from reactive sampling to proactive, real-time decision-making. You'll see how we built a production-grade system that processes sensor data from injection moulding machines across global manufacturing sites, enabling immediate quality insights without touching the machines themselves. Technical Implementation: We built a hybrid on-premise and cloud architecture handling real-time sensor data streams. The system captures sensor data directly from the machines, sends them to the on-premise Kafka deployments, from where the ML models (deployed using Apache Flink) deliver predictions to shop floor operators in under one second end-to-end. I'll share our architectural decisions, the challenges of maintaining sub-second latency at scale, and how we validated ML model accuracy against precision measurement equipment. The Journey & Key Learnings: Rather than presenting a polished success story, I'll walk through our iterative hypothesis-testing approach—what worked, what failed, and why. I'll discuss the human factors: building trust through transparency, involving shop floor workers in the design process, and navigating medical device manufacturing regulations. Audience Takeaways: Attendees will learn practical patterns for implementing real-time ML in industrial environments, strategies for iterative validation of streaming system assumptions, and how to bridge the gap between data science prototypes and production-grade systems that non-technical users trust and adopt.

Samuel von Baußnern

D ONE – Data Driven Value Creation

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Keynote

Building Intelligent Systems on Real Time Data

Confluent CEO Jay Kreps takes the stage alongside industry leaders at data streaming’s biggest event. Together, they’ll show why free-flowing, real-time data has become the key to unleashing the full potential of intelligent systems across every business. From live demos to real-world use cases to industry-changing product announcements, this year’s keynote is essential viewing for anyone looking to maximize the potential of their AI. Which is pretty much everyone. Don’t miss it.

Jay Kreps / Shaun Clowes / Sean Falconer

Building Intelligent Systems on Real Time Data

Confluent CEO Jay Kreps takes the stage alongside industry leaders at data streaming’s biggest event. Together, they’ll show why free-flowing, real-time data has become the key to unleashing the full potential of intelligent systems across every business. From live demos to real-world use cases to industry-changing product announcements, this year’s keynote is essential viewing for anyone looking to maximize the potential of their AI. Which is pretty much everyone. Don’t miss it.

Jay Kreps

Confluent

Shaun Clowes

Confluent

Sean Falconer

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Handling Surges in Petabyte-Scale Streaming Systems by Doing Nothing

When streaming data at petabyte scale, one of the most painful on-call scenarios is handling sudden traffic surges that overload servers, trigger cascading failures, and wipe out service availability across a large blast radius. At modern throughput levels, scaling operations are simply not fast enough to prevent unexpected 10–20x spikes from taking down dozens of streaming pipelines and their neighbors. The classical mitigation is to overprovision replicas and headroom, add proactive alerting, and hope to “react quickly.” In this talk, we present a TCP-based congestion control approach that tackles the problem at its root and eliminates the need for manual on-call intervention. At Pinterest, we have productionized this TCP-based flow control solution in a 50 GB/s streaming system that powers machine learning across the company. By setting up the appropriate end-to-end flow control mechanisms, we guard against sudden surges of any magnitude by propagating backpressure gracefully, predictably, and fully autonomously. We will walk through the key concepts in networking, memory management, and backpressure that matter in large-scale streaming systems, and then unpack the exact mechanism we built to solve this problem. The audience will leave with a set of production-ready ideas and patterns that can be replicated in their own streaming environments in ways that are far more cost-efficient and operationally lightweight than classical solutions. Beyond eliminating the catastrophic risk of sudden traffic surges, we will share concrete and replicable takeaways from running these concepts in production at scale, including: Designing streaming topologies that rely on backpressure instead of excess capacity Safely transforming scaling and load balancing into reactive operations, reducing unnecessary early alerting and interventions Simplifying capacity planning for organic growth Lowering infrastructure cost by running denser workloads with minimal buffer headroom

Jeff Xiang

Handling Surges in Petabyte-Scale Streaming Systems by Doing Nothing

When streaming data at petabyte scale, one of the most painful on-call scenarios is handling sudden traffic surges that overload servers, trigger cascading failures, and wipe out service availability across a large blast radius. At modern throughput levels, scaling operations are simply not fast enough to prevent unexpected 10–20x spikes from taking down dozens of streaming pipelines and their neighbors. The classical mitigation is to overprovision replicas and headroom, add proactive alerting, and hope to “react quickly.” In this talk, we present a TCP-based congestion control approach that tackles the problem at its root and eliminates the need for manual on-call intervention. At Pinterest, we have productionized this TCP-based flow control solution in a 50 GB/s streaming system that powers machine learning across the company. By setting up the appropriate end-to-end flow control mechanisms, we guard against sudden surges of any magnitude by propagating backpressure gracefully, predictably, and fully autonomously. We will walk through the key concepts in networking, memory management, and backpressure that matter in large-scale streaming systems, and then unpack the exact mechanism we built to solve this problem. The audience will leave with a set of production-ready ideas and patterns that can be replicated in their own streaming environments in ways that are far more cost-efficient and operationally lightweight than classical solutions. Beyond eliminating the catastrophic risk of sudden traffic surges, we will share concrete and replicable takeaways from running these concepts in production at scale, including: Designing streaming topologies that rely on backpressure instead of excess capacity Safely transforming scaling and load balancing into reactive operations, reducing unnecessary early alerting and interventions Simplifying capacity planning for organic growth Lowering infrastructure cost by running denser workloads with minimal buffer headroom

Jeff Xiang

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Dynamic Kafka, Static Sleep: Taming Multi-Cluster Streams with Flink at OpenAI

At OpenAI, Kafka streams don’t sit still: a single logical “stream” can span multiple clusters and sometimes multiple regions, and the underlying topology changes as we migrate, scale, or fail over. That’s great for availability—but it’s a sharp edge for stream processors that assume “one cluster, stable topics, one offset story.” (Spoiler: that assumption dies first.) This talk shares our journey at OpenAI to make Apache Flink’s DynamicKafkaSource fit that reality, using our Kafka to Warehouse ingestion system “StreamLink” built on Flink as a case study. We’ll walk through the mental model shift from “topics on a cluster” to “a stream over an ever-changing infra topology,” what worked, and where we ran into the most interesting edge cases—around offsets, state, and operational safety when Kafka topology evolves underneath a running Flink job. Rather than presenting a polished fairy tale where every checkpoint is happy and every offset is deterministic, we’ll focus on the decisions and tradeoffs: the approaches we considered, the guardrails we’re putting in place, what we’re validating, and the questions we think the community should care about as dynamic consumption of Kafka becomes more popular. We’ll also cover what we’re contributing back to OSS across core implementations and APIs (Java/Python/Table/SQL), and a practical roadmap. You’ll leave with patterns you can apply to multi-cluster Kafka + Flink deployments, a checklist of “gotchas” to watch for, and a few ideas you can steal — because if Kafka is going to be dynamic, your consumption strategy should be too (preferably without becoming dynamically on-call)

Bowen Li / Xin Gao

Dynamic Kafka, Static Sleep: Taming Multi-Cluster Streams with Flink at OpenAI

At OpenAI, Kafka streams don’t sit still: a single logical “stream” can span multiple clusters and sometimes multiple regions, and the underlying topology changes as we migrate, scale, or fail over. That’s great for availability—but it’s a sharp edge for stream processors that assume “one cluster, stable topics, one offset story.” (Spoiler: that assumption dies first.) This talk shares our journey at OpenAI to make Apache Flink’s DynamicKafkaSource fit that reality, using our Kafka to Warehouse ingestion system “StreamLink” built on Flink as a case study. We’ll walk through the mental model shift from “topics on a cluster” to “a stream over an ever-changing infra topology,” what worked, and where we ran into the most interesting edge cases—around offsets, state, and operational safety when Kafka topology evolves underneath a running Flink job. Rather than presenting a polished fairy tale where every checkpoint is happy and every offset is deterministic, we’ll focus on the decisions and tradeoffs: the approaches we considered, the guardrails we’re putting in place, what we’re validating, and the questions we think the community should care about as dynamic consumption of Kafka becomes more popular. We’ll also cover what we’re contributing back to OSS across core implementations and APIs (Java/Python/Table/SQL), and a practical roadmap. You’ll leave with patterns you can apply to multi-cluster Kafka + Flink deployments, a checklist of “gotchas” to watch for, and a few ideas you can steal — because if Kafka is going to be dynamic, your consumption strategy should be too (preferably without becoming dynamically on-call)

Bowen Li

OpenAI

Xin Gao

OpenAI

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Turning the database inside out again: What if everything was Iceberg?

Over a decade ago, Martin Kleppmann's Turning the Database Inside Out reshaped how we think about data systems, putting the event stream at the heart of storage and computation. That vision inspired a generation of systems built atop Kafka, Flink, and event-driven materializations. But what if we never finished what Martin started? This talk takes the next leap, reimagining not just the transaction log, but the entire database through the lens of streaming. We'll keep Kafka as our canonical source of truth, but enrich it with the missing primitives: long-term storage, indexes, and projections. To achieve this, we'll move beyond Kafka's simple produce/consume model and embrace Apache Iceberg as the new foundation for durable, queryable event data. This architecture collapses the fragile ETL sprawl and unifies real-time and historical data into a single, coherent system. Answering questions from "what's happening right now?" all the way back to "what happened at the beginning of time?". You'll leave seeing the database (and the stream) in a whole new light. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Tom Scott

Turning the database inside out again: What if everything was Iceberg?

Over a decade ago, Martin Kleppmann's Turning the Database Inside Out reshaped how we think about data systems, putting the event stream at the heart of storage and computation. That vision inspired a generation of systems built atop Kafka, Flink, and event-driven materializations. But what if we never finished what Martin started? This talk takes the next leap, reimagining not just the transaction log, but the entire database through the lens of streaming. We'll keep Kafka as our canonical source of truth, but enrich it with the missing primitives: long-term storage, indexes, and projections. To achieve this, we'll move beyond Kafka's simple produce/consume model and embrace Apache Iceberg as the new foundation for durable, queryable event data. This architecture collapses the fragile ETL sprawl and unifies real-time and historical data into a single, coherent system. Answering questions from "what's happening right now?" all the way back to "what happened at the beginning of time?". You'll leave seeing the database (and the stream) in a whole new light. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Tom Scott

Streambased

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

From Blind Spots to Full Visibility: Kafka Observability with OpenTelemetry

In the world of modern finance, “We'll check those logs later” just doesn’t fly. At Fidelity, every Kafka event supports a regulatory audit, an operational workflow, or a customer’s investment-here observability isn’t a nice-to-have; it's a compliance requirement. But keeping a watchful eye on streaming data at enterprise scale isn’t for the faint of heart, especially when you want reliability, transparency, agility, and a good night's sleep for your SREs. In this session, we’ll walk through how Fidelity has engineered an enterprise-grade observability platform for Kafka that brings together real-time monitoring, unified metrics, and interactive dashboards. By leveraging OpenTelemetry for vendor-neutral data collection, Grafana for dynamic visualization, and OpenSearch for comprehensive log analysis, Fidelity has built a robust observability stack that keeps a vigilant eye over every Kafka stream. Attendees will walk away with insights on conquering compliance hurdles, supporting rapid incident response, and designing observability with enterprise reliability in mind. If your goal is to modernize legacy monitoring or embrace open-source culture like a fintech pro, this session offers a blueprint for building scalable Kafka observability.

Evan Kelly / Manish Dusad

From Blind Spots to Full Visibility: Kafka Observability with OpenTelemetry

In the world of modern finance, “We'll check those logs later” just doesn’t fly. At Fidelity, every Kafka event supports a regulatory audit, an operational workflow, or a customer’s investment-here observability isn’t a nice-to-have; it's a compliance requirement. But keeping a watchful eye on streaming data at enterprise scale isn’t for the faint of heart, especially when you want reliability, transparency, agility, and a good night's sleep for your SREs. In this session, we’ll walk through how Fidelity has engineered an enterprise-grade observability platform for Kafka that brings together real-time monitoring, unified metrics, and interactive dashboards. By leveraging OpenTelemetry for vendor-neutral data collection, Grafana for dynamic visualization, and OpenSearch for comprehensive log analysis, Fidelity has built a robust observability stack that keeps a vigilant eye over every Kafka stream. Attendees will walk away with insights on conquering compliance hurdles, supporting rapid incident response, and designing observability with enterprise reliability in mind. If your goal is to modernize legacy monitoring or embrace open-source culture like a fintech pro, this session offers a blueprint for building scalable Kafka observability.

Evan Kelly

Fidelity Investments

Manish Dusad

Fidelity Investments

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Streamiz: Bringing Native Kafka Streams to the .NET Ecosystem

Building real-time streaming applications in .NET? You’ve probably hit the wall of limited options and wondered why the JVM ecosystem gets all the love with Kafka Streams.Enter Streamiz - a powerful .NET library that brings stream processing capabilities directly to your C# applications. But does it live up to the hype?In this session, we’ll dive into: - Live coding a real-time data pipeline with Streamiz- Comparison with Kafka Streams- When to choose Streamiz vs. other streaming solutions Through hands-on demos and honest technical analysis, you’ll walk away knowing exactly whether Streamiz deserves a place in your streaming architecture.Perfect for .NET developers tired of being second-class citizens in the streaming world!

Wllem Surreyus

Streamiz: Bringing Native Kafka Streams to the .NET Ecosystem

Building real-time streaming applications in .NET? You’ve probably hit the wall of limited options and wondered why the JVM ecosystem gets all the love with Kafka Streams.Enter Streamiz - a powerful .NET library that brings stream processing capabilities directly to your C# applications. But does it live up to the hype?In this session, we’ll dive into: - Live coding a real-time data pipeline with Streamiz- Comparison with Kafka Streams- When to choose Streamiz vs. other streaming solutions Through hands-on demos and honest technical analysis, you’ll walk away knowing exactly whether Streamiz deserves a place in your streaming architecture.Perfect for .NET developers tired of being second-class citizens in the streaming world!

Wllem Surreyus

Cymo

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Who Let the Agent In? Securing MCP Servers in Production

The Model Context Protocol (MCP) is reshaping how agents interact with tools and APIs, but building MCP servers that are secure, governed, and production-ready is still a challenge. Many teams want to expose powerful capabilities through MCP, yet struggle to implement authentication and authorization that follow the MCP specification while staying flexible for real-world use cases. This talk focuses on how to implement MCP-spec-compliant authentication and rich authorization models for your MCP servers without unnecessary complexity. We will start with a clear overview of how MCP handles identity and access. After that, we will walk through a minimal MCP server implementation. Once the basics are in place, we will add standards-aligned authentication and explore techniques for fine-grained and contextual authorization using OpenFGA. The session will also connect these patterns to real-world data streaming and API governance scenarios, where multiple services, tools, and agents require controlled access to event streams, schemas, or domain-specific operations. As enterprises adopt agent-driven architectures, securing access to streaming systems becomes increasingly important. To wrap up, we will look at solutions that can provide the same authentication and authorization capabilities, including FGA-style access control, through a fully managed and no-code approach. This lets you focus on building MCP servers instead of maintaining multiple security layers. Audience Takeaways: A practical understanding of MCP authentication and how to implement it correctly A reference design for fine-grained authorization for MCP using OpenFGA Patterns for governing access to streaming systems and APIs exposed through MCP How to offload the entire security layer to Gravitee without writing any additional code in your MCP server Actionable guidance you can apply immediately when building your own MCP servers By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Prachi Jamadade

Who Let the Agent In? Securing MCP Servers in Production

The Model Context Protocol (MCP) is reshaping how agents interact with tools and APIs, but building MCP servers that are secure, governed, and production-ready is still a challenge. Many teams want to expose powerful capabilities through MCP, yet struggle to implement authentication and authorization that follow the MCP specification while staying flexible for real-world use cases. This talk focuses on how to implement MCP-spec-compliant authentication and rich authorization models for your MCP servers without unnecessary complexity. We will start with a clear overview of how MCP handles identity and access. After that, we will walk through a minimal MCP server implementation. Once the basics are in place, we will add standards-aligned authentication and explore techniques for fine-grained and contextual authorization using OpenFGA. The session will also connect these patterns to real-world data streaming and API governance scenarios, where multiple services, tools, and agents require controlled access to event streams, schemas, or domain-specific operations. As enterprises adopt agent-driven architectures, securing access to streaming systems becomes increasingly important. To wrap up, we will look at solutions that can provide the same authentication and authorization capabilities, including FGA-style access control, through a fully managed and no-code approach. This lets you focus on building MCP servers instead of maintaining multiple security layers. Audience Takeaways: A practical understanding of MCP authentication and how to implement it correctly A reference design for fine-grained authorization for MCP using OpenFGA Patterns for governing access to streaming systems and APIs exposed through MCP How to offload the entire security layer to Gravitee without writing any additional code in your MCP server Actionable guidance you can apply immediately when building your own MCP servers By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Prachi Jamadade

Gravitee

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

The Missing Piece in the Kafka Stack: Durable Functions for Event-Driven Apps and AI Agents

Kafka solved the hard part of event-driven architecture: a scalable, durable, replayable log with strong delivery guarantees. Stream processing (Kafka Streams, Flink) then made analytics and continuous computation first-class. Yet when teams use Kafka to build event-driven applications - async or long running application logic - they still end up rebuilding the same reliability and correctness mechanisms: idempotency, retries, timers, state management, sagas/compensation, and exactly-once interactions across services. The result is a complex glue layer of infrastructure that’s difficult to reason about and hard to operate. This talk shows how Restate complements the Kafka stack and provides the missing runtime layer for application and agent workloads. Restate takes the event-log idea, but flips the unit of abstraction from events to durable function invocations: a handler call becomes a persistent, resumable process with exactly-once semantics for execution, state, and service-to-service communication. Instead of stitching together consumers, databases, outboxes, schedulers, and workflow engines, developers write ordinary code, while Restate transparently persists progress, deduplicates, retries safely, and supports durable RPC, callbacks, and long waits. We’ll walk through two concrete patterns: From Kafka to a Durable Function handler and multi-step orchestration (e.g., payments or order fulfillment style flows) Durable AI loops: tool-using agents that pause for human input, recover from partial failures, and remain observable and controllable Finally, we’ll cover operational advantages: fine-grained introspection into each invocation and the ability to pause/resume/cancel/retry individual executions, thus turning “black box” event flows into debuggable, operable application processes.

Stephan Ewan

The Missing Piece in the Kafka Stack: Durable Functions for Event-Driven Apps and AI Agents

Kafka solved the hard part of event-driven architecture: a scalable, durable, replayable log with strong delivery guarantees. Stream processing (Kafka Streams, Flink) then made analytics and continuous computation first-class. Yet when teams use Kafka to build event-driven applications - async or long running application logic - they still end up rebuilding the same reliability and correctness mechanisms: idempotency, retries, timers, state management, sagas/compensation, and exactly-once interactions across services. The result is a complex glue layer of infrastructure that’s difficult to reason about and hard to operate. This talk shows how Restate complements the Kafka stack and provides the missing runtime layer for application and agent workloads. Restate takes the event-log idea, but flips the unit of abstraction from events to durable function invocations: a handler call becomes a persistent, resumable process with exactly-once semantics for execution, state, and service-to-service communication. Instead of stitching together consumers, databases, outboxes, schedulers, and workflow engines, developers write ordinary code, while Restate transparently persists progress, deduplicates, retries safely, and supports durable RPC, callbacks, and long waits. We’ll walk through two concrete patterns: From Kafka to a Durable Function handler and multi-step orchestration (e.g., payments or order fulfillment style flows) Durable AI loops: tool-using agents that pause for human input, recover from partial failures, and remain observable and controllable Finally, we’ll cover operational advantages: fine-grained introspection into each invocation and the ability to pause/resume/cancel/retry individual executions, thus turning “black box” event flows into debuggable, operable application processes.

Stephan Ewan

Restate

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 0206

Breakout Session

Breaking Kafka at Scale: Lessons from Running 70K Topics on a Single Cluster

Breaking Kafka isn’t that hard, deploying 70K topics on a single cluster will certainly do the trick. High availability quickly triples the blast radius, pushing past the 200K partition stability threshold. At this scale, stability becomes fragile, and keeping production alive feels more like firefighting than engineering. In this session, we’ll share our real-world Kafka journey: a technical migration from an aging, single-tenancy architecture to a massively scaled, multi-tenant platform. We'll detail how we engineered this platform to handle billions of events per day, power a super-fast UI, and maintain real-time replication underneath. We will dive into the internals of our overwhelmed Kafka cluster, showcasing how we utilized Kafka Connect and Debezium running on Kubernetes to replicate customer data from MySQL to SingleStore in under 10 seconds. Finally, we’ll share the concrete, quantifiable outcomes: an 80% reduction in Kafka infrastructure costs and the elimination of entire classes of stability issues. This talk is packed with practical lessons, architectural trade-offs, and hard-earned insights. It is ideal for Intermediate to Senior Data Engineers, Architects and teams operating Kafka at scale (on-prem or cloud) facing cost, performance, or stability challenges.

Ziv Fridfertig

Breaking Kafka at Scale: Lessons from Running 70K Topics on a Single Cluster

Breaking Kafka isn’t that hard, deploying 70K topics on a single cluster will certainly do the trick. High availability quickly triples the blast radius, pushing past the 200K partition stability threshold. At this scale, stability becomes fragile, and keeping production alive feels more like firefighting than engineering. In this session, we’ll share our real-world Kafka journey: a technical migration from an aging, single-tenancy architecture to a massively scaled, multi-tenant platform. We'll detail how we engineered this platform to handle billions of events per day, power a super-fast UI, and maintain real-time replication underneath. We will dive into the internals of our overwhelmed Kafka cluster, showcasing how we utilized Kafka Connect and Debezium running on Kubernetes to replicate customer data from MySQL to SingleStore in under 10 seconds. Finally, we’ll share the concrete, quantifiable outcomes: an 80% reduction in Kafka infrastructure costs and the elimination of entire classes of stability issues. This talk is packed with practical lessons, architectural trade-offs, and hard-earned insights. It is ideal for Intermediate to Senior Data Engineers, Architects and teams operating Kafka at scale (on-prem or cloud) facing cost, performance, or stability challenges.

Ziv Fridfertig

Skai

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Defending the Perimeter: Patterns for Secure External Event Exchange

In the era of the "Connected Enterprise," data doesn't just stay inside your private network. You need to share real-time logistics with partners, stream live telemetry to mobile apps, and ingest events from third-party vendors. However, exposing your Kafka brokers directly to the internet is a major security risk. Traditional firewalls and REST-based API Gateways are ill-equipped to handle the persistent, bi-directional, and high-throughput nature of event streams. This session introduces the concept of the "Event Perimeter"—a dedicated architectural layer designed to facilitate secure event exchange. We will analyze the Event Gateway as a "Smart DMZ" that provides an air-gap between your internal event mesh and the outside world. We will dive deep into technical patterns for Zero Trust Streaming, including how to move authentication and authorization logic from the broker level to the edge. A significant portion of the talk will focus on Policy Enforcement. We will demonstrate how to integrate an Event Gateway with solutions of the ecosystem to perform fine-grained "Content-Based Access Control." This allows you to dynamically redact PII fields or filter specific events based on the consumer's identity before the data crosses the perimeter. Whether you are dealing with GDPR compliance or simply protecting your brokers from accidental DDoS, this session provides a vendor-neutral framework for secure streaming. Key Takeaways: The Air-Gap Pattern: Architecting a "Smart Proxy" to isolate your internal Kafka clusters.Fine-Grained Security: Using ecosystem solutions and the Gateway to redact sensitive data in real-time.Operational Safety: Implementing rate limiting, quotas, and circuit breakers specifically designed for event-driven traffic.

Hugo Guerrero

Defending the Perimeter: Patterns for Secure External Event Exchange

In the era of the "Connected Enterprise," data doesn't just stay inside your private network. You need to share real-time logistics with partners, stream live telemetry to mobile apps, and ingest events from third-party vendors. However, exposing your Kafka brokers directly to the internet is a major security risk. Traditional firewalls and REST-based API Gateways are ill-equipped to handle the persistent, bi-directional, and high-throughput nature of event streams. This session introduces the concept of the "Event Perimeter"—a dedicated architectural layer designed to facilitate secure event exchange. We will analyze the Event Gateway as a "Smart DMZ" that provides an air-gap between your internal event mesh and the outside world. We will dive deep into technical patterns for Zero Trust Streaming, including how to move authentication and authorization logic from the broker level to the edge. A significant portion of the talk will focus on Policy Enforcement. We will demonstrate how to integrate an Event Gateway with solutions of the ecosystem to perform fine-grained "Content-Based Access Control." This allows you to dynamically redact PII fields or filter specific events based on the consumer's identity before the data crosses the perimeter. Whether you are dealing with GDPR compliance or simply protecting your brokers from accidental DDoS, this session provides a vendor-neutral framework for secure streaming. Key Takeaways: The Air-Gap Pattern: Architecting a "Smart Proxy" to isolate your internal Kafka clusters.Fine-Grained Security: Using ecosystem solutions and the Gateway to redact sensitive data in real-time.Operational Safety: Implementing rate limiting, quotas, and circuit breakers specifically designed for event-driven traffic.

Hugo Guerrero

Kong

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

When 30% of your engineering time is spent on non-value-added data tasks, it's time for radical change. This talk chronicles FanDuel's ambitious Streamlined Event Acquisition Strategy (SEAS) - a company-wide migration from fragmented, team-specific solutions to a unified Kafka/Flink-based event streaming platform that now processes billions of events daily. We'll explore how our Data Team led the transformation from a world where every team reinvented the event publishing wheel to a standardized "golden path" that reduced time-to-insight by 60% while cutting infrastructure costs significantly. The journey wasn't just technical - it required cultural change, cross-team collaboration, and careful change management across dozens of product teams. Key topics covered: The business case that drove SEAS: quantifying the hidden costs of data fragmentation Designing the golden path: standardized event formats, schema evolution, and strong data contracts Migration strategies that kept production systems running during the transition Building language-agnostic SDKs and tooling that made adoption effortless Measuring success: from engineering velocity to data quality improvements Building the dream-team that can drive this transformation forward Real-world examples include migrating our high-volume betting systems during football season, handling schema evolution for legacy integrations, and the monitoring strategies that prevented data quality disasters. Attendees will learn practical frameworks for driving enterprise-wide streaming standardization in complex, multi-team environments.

Tony Cui / Alexandru Barbu

Chaos to Golden Path - How FanDuel's Eventing Strategy Transformed Enterprise Event Streaming

When 30% of your engineering time is spent on non-value-added data tasks, it's time for radical change. This talk chronicles FanDuel's ambitious Streamlined Event Acquisition Strategy (SEAS) - a company-wide migration from fragmented, team-specific solutions to a unified Kafka/Flink-based event streaming platform that now processes billions of events daily. We'll explore how our Data Team led the transformation from a world where every team reinvented the event publishing wheel to a standardized "golden path" that reduced time-to-insight by 60% while cutting infrastructure costs significantly. The journey wasn't just technical - it required cultural change, cross-team collaboration, and careful change management across dozens of product teams. Key topics covered: The business case that drove SEAS: quantifying the hidden costs of data fragmentation Designing the golden path: standardized event formats, schema evolution, and strong data contracts Migration strategies that kept production systems running during the transition Building language-agnostic SDKs and tooling that made adoption effortless Measuring success: from engineering velocity to data quality improvements Building the dream-team that can drive this transformation forward Real-world examples include migrating our high-volume betting systems during football season, handling schema evolution for legacy integrations, and the monitoring strategies that prevented data quality disasters. Attendees will learn practical frameworks for driving enterprise-wide streaming standardization in complex, multi-team environments.

Tony Cui

Fanduel

Alexandru Barbu

Fanduel

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Bridging Stream and Queue: Protocol Enhancements For Kafka's Share Groups

The introduction of Kafka Share Groups fundamentally re-architects Kafka to decouple consumer scaling from partition count, enabling queue-like consumption over standard topics. This talk focuses on the essential protocol enhancements introduced in Apache Kafka 4.2 that bridge the gap to queue semantics. Specifically, will detail, KIP-1206: Record Limit and Batch Optimized Behaviour, which improves the records delivery mechanism to enable work-queue-like workloads; KIP-1222: Records Lease Renew, which provides a mechanism for applications to process records which takes longer processing time; KIP-1226: Share Lag Computation, which enables the auto scaler to monitor the share lag for the group to manage horizontal scaling. Attendees will gain a clear, actionable understanding of the resiliency associated with long-running tasks, workload optimization and operational scaling associated with the share groups.

Apoorv Mittal / Andrew Schofield

Bridging Stream and Queue: Protocol Enhancements For Kafka's Share Groups

The introduction of Kafka Share Groups fundamentally re-architects Kafka to decouple consumer scaling from partition count, enabling queue-like consumption over standard topics. This talk focuses on the essential protocol enhancements introduced in Apache Kafka 4.2 that bridge the gap to queue semantics. Specifically, will detail, KIP-1206: Record Limit and Batch Optimized Behaviour, which improves the records delivery mechanism to enable work-queue-like workloads; KIP-1222: Records Lease Renew, which provides a mechanism for applications to process records which takes longer processing time; KIP-1226: Share Lag Computation, which enables the auto scaler to monitor the share lag for the group to manage horizontal scaling. Attendees will gain a clear, actionable understanding of the resiliency associated with long-running tasks, workload optimization and operational scaling associated with the share groups.

Apoorv Mittal

Confluent

Andrew Schofield

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Enterprise ready with the Flink HTTP Connector

Have you ever wished you could handle problematic events in Flink SQL as easily as with DataStream side outputs? Imagine routing unprocessable records—such as those failing serialization—straight to a dead-letter Kafka queue without stopping your job. The new Apache Flink HTTP connector makes this possible while unlocking even more capabilities. It allows you to treat API endpoints as dynamic Flink tables, enabling seamless integration with any technology that exposes APIs—without writing custom code. For example, you can connect to your favorite AI endpoint simply by declaring a Flink SQL table. In this session, you’ll learn how to leverage the HTTP connector to keep your Flink jobs running even after exceptions, HTTP error codes, or deserialization failures. We’ll explore how its new metadata columns provide powerful tools for error handling and observability. You’ll also discover best practices for tuning the connector for enterprise scenarios, including caching strategies, security configurations, and retry mechanisms. Key Takeaways: - How to integrate APIs into Flink SQL with zero custom code - Techniques for handling errors gracefully and improving resilience - Using metadata columns for better monitoring and debugging - Enterprise tuning tips: caching, security, and retries Get ready to make your Flink pipelines more resilient, scalable, and enterprise-ready with the HTTP connector!

David Radley

Enterprise ready with the Flink HTTP Connector

Have you ever wished you could handle problematic events in Flink SQL as easily as with DataStream side outputs? Imagine routing unprocessable records—such as those failing serialization—straight to a dead-letter Kafka queue without stopping your job. The new Apache Flink HTTP connector makes this possible while unlocking even more capabilities. It allows you to treat API endpoints as dynamic Flink tables, enabling seamless integration with any technology that exposes APIs—without writing custom code. For example, you can connect to your favorite AI endpoint simply by declaring a Flink SQL table. In this session, you’ll learn how to leverage the HTTP connector to keep your Flink jobs running even after exceptions, HTTP error codes, or deserialization failures. We’ll explore how its new metadata columns provide powerful tools for error handling and observability. You’ll also discover best practices for tuning the connector for enterprise scenarios, including caching strategies, security configurations, and retry mechanisms. Key Takeaways: - How to integrate APIs into Flink SQL with zero custom code - Techniques for handling errors gracefully and improving resilience - Using metadata columns for better monitoring and debugging - Enterprise tuning tips: caching, security, and retries Get ready to make your Flink pipelines more resilient, scalable, and enterprise-ready with the HTTP connector!

David Radley

IBM

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

How Datadog Runs Its Streaming Platform

Operating Kafka in production is hard. Operating thousands of Kafka clusters globally—without customer-visible incidents—is an entirely different problem. At Datadog, Kafka is the backbone of our real-time data ingestion and streaming platform, processing petabytes of data every day across thousands of clusters and tens of thousands of brokers. At this scale, failures are rarely loud or localized. Instead, they surface as subtle latency shifts, uneven consumer lag, stalled rebalances, or slow partitions that quietly degrade customer experience if not caught early. The hardest part is not detecting symptoms—it’s identifying root causes fast enough to prevent impact. Standard Kafka monitoring (JMX metrics, broker health, consumer lag) breaks down when incidents span multiple clusters, teams, and regions. This talk explores how Datadog runs a massive Kafka fleet in production while minimizing incidents and customer impact, and the observability practices that make this possible. Through real production scenarios, we’ll show how we correlate signals across brokers, consumers, storage layers, and infrastructure to understand why something is wrong—not just that it is. We’ll dive into the technical foundation behind this approach: Partition-level throughput and latency analysis to detect emerging hot spotsContinuous profiling to identify GC and allocation issues before they affect tail latencyDistributed tracing to follow slow produce and fetch paths across services and clustersDynamic instrumentation to debug live Kafka services safely, without redeploymentsFleet-wide dashboards, anomaly detection, and SLOs to prioritize issues that matterBeyond tooling, we’ll share the operational patterns we rely on to keep Kafka stable at scale: detecting configuration drift across thousands of clusters, preventing cascading failures, and shifting from reactive firefighting to predictive capacity and risk management. This session is for engineers running Kafka in serious production environments who want to understand what it takes to operate streaming systems at global scale—and how modern observability enables reliability when failure is the default state.

Nandini Singhal

How Datadog Runs Its Streaming Platform

Operating Kafka in production is hard. Operating thousands of Kafka clusters globally—without customer-visible incidents—is an entirely different problem. At Datadog, Kafka is the backbone of our real-time data ingestion and streaming platform, processing petabytes of data every day across thousands of clusters and tens of thousands of brokers. At this scale, failures are rarely loud or localized. Instead, they surface as subtle latency shifts, uneven consumer lag, stalled rebalances, or slow partitions that quietly degrade customer experience if not caught early. The hardest part is not detecting symptoms—it’s identifying root causes fast enough to prevent impact. Standard Kafka monitoring (JMX metrics, broker health, consumer lag) breaks down when incidents span multiple clusters, teams, and regions. This talk explores how Datadog runs a massive Kafka fleet in production while minimizing incidents and customer impact, and the observability practices that make this possible. Through real production scenarios, we’ll show how we correlate signals across brokers, consumers, storage layers, and infrastructure to understand why something is wrong—not just that it is. We’ll dive into the technical foundation behind this approach: Partition-level throughput and latency analysis to detect emerging hot spotsContinuous profiling to identify GC and allocation issues before they affect tail latencyDistributed tracing to follow slow produce and fetch paths across services and clustersDynamic instrumentation to debug live Kafka services safely, without redeploymentsFleet-wide dashboards, anomaly detection, and SLOs to prioritize issues that matterBeyond tooling, we’ll share the operational patterns we rely on to keep Kafka stable at scale: detecting configuration drift across thousands of clusters, preventing cascading failures, and shifting from reactive firefighting to predictive capacity and risk management. This session is for engineers running Kafka in serious production environments who want to understand what it takes to operate streaming systems at global scale—and how modern observability enables reliability when failure is the default state.

Nandini Singhal

Datadog

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Getting Started with Apache Flink: Essential Patterns and Best Practices

This session provides a comprehensive introduction to Apache Flink for developers and architects who seek to build streaming solutions that are resilient, efficient, and maintainable. I will move through three critical layers of Flink development: 1. Establish a solid foundation based on well-engineered data products You will learn best practices for: Managing formats and schemas for the long term. Ensuring data integrity and implementing error handling. Working with streams of immutable records vs. streams with updates. Handling the nuances of watermarking and late-data strategies. 2. Compose solutions from event streaming patterns Rather than writing monolithic scripts, I will show you how to decompose complex problems using reusable components based on these design patterns: Deduplication: removing duplicate events Correlation: linking related events across streams (e.g., orders and their shipments) Aggregation: computing real-time analytics Enrichment: adding context to events from reference data Pattern matching: detecting sequences or anomalies in event streams 3. Insist on operational excellence Finally, I ground the technical theory in operational reality, and discuss the fundamentals that will help ensure that your application scales without breaking the bank or the cluster. You will learn how to: Manage state mindfully and prevent indefinite state growth. Navigate hidden costs by understanding the trade-offs and limitations inherent in some common situations. Guarantee quality by creating solutions that can be tested, maintained, and evolved. Key takeaway: Whether you are a newcomer to Flink or looking to improve your existing streaming platform, you will walk away with a practical checklist and a library of patterns to build data products that are as resilient as they are performant.

David Anderson

Getting Started with Apache Flink: Essential Patterns and Best Practices

This session provides a comprehensive introduction to Apache Flink for developers and architects who seek to build streaming solutions that are resilient, efficient, and maintainable. I will move through three critical layers of Flink development: 1. Establish a solid foundation based on well-engineered data products You will learn best practices for: Managing formats and schemas for the long term. Ensuring data integrity and implementing error handling. Working with streams of immutable records vs. streams with updates. Handling the nuances of watermarking and late-data strategies. 2. Compose solutions from event streaming patterns Rather than writing monolithic scripts, I will show you how to decompose complex problems using reusable components based on these design patterns: Deduplication: removing duplicate events Correlation: linking related events across streams (e.g., orders and their shipments) Aggregation: computing real-time analytics Enrichment: adding context to events from reference data Pattern matching: detecting sequences or anomalies in event streams 3. Insist on operational excellence Finally, I ground the technical theory in operational reality, and discuss the fundamentals that will help ensure that your application scales without breaking the bank or the cluster. You will learn how to: Manage state mindfully and prevent indefinite state growth. Navigate hidden costs by understanding the trade-offs and limitations inherent in some common situations. Guarantee quality by creating solutions that can be tested, maintained, and evolved. Key takeaway: Whether you are a newcomer to Flink or looking to improve your existing streaming platform, you will walk away with a practical checklist and a library of patterns to build data products that are as resilient as they are performant.

David Anderson

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Keeping data private in real-time pipelines

We all love real-time data — clicks, payments, rides, messages — but most of it comes with a catch: it contains personal information we’re not supposed to leak, such as names, emails, locations, or even small clues that can identify someone. The challenge: how do we keep streaming data useful and safe at the same time? In this talk, we’ll explore practical ways to protect privacy in streaming systems using Apache Kafka, Apache Flink, and Apache Iceberg. We’ll cover:- simple tricks like masking and tokenizing PII;- why “anonymous” data often isn’t anonymous (the re-identification problem);- techniques like bucketing, k-anonymity, and adding noise;- how to balance privacy with data utility (too much hiding makes data useless). Along the way, we’ll look at real-world stories: from public data leaks to surprising deanonymization attacks, and show live demos of pipelines that anonymize data before it’s written to storage.If you’ve ever wondered how to build privacy-aware pipelines, this talk will give you practical patterns you can use right away.

x

Keeping data private in real-time pipelines

We all love real-time data — clicks, payments, rides, messages — but most of it comes with a catch: it contains personal information we’re not supposed to leak, such as names, emails, locations, or even small clues that can identify someone. The challenge: how do we keep streaming data useful and safe at the same time? In this talk, we’ll explore practical ways to protect privacy in streaming systems using Apache Kafka, Apache Flink, and Apache Iceberg. We’ll cover:- simple tricks like masking and tokenizing PII;- why “anonymous” data often isn’t anonymous (the re-identification problem);- techniques like bucketing, k-anonymity, and adding noise;- how to balance privacy with data utility (too much hiding makes data useless). Along the way, we’ll look at real-world stories: from public data leaks to surprising deanonymization attacks, and show live demos of pipelines that anonymize data before it’s written to storage.If you’ve ever wondered how to build privacy-aware pipelines, this talk will give you practical patterns you can use right away.

x

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

Streaming CDC to Apache Iceberg at Scale with Apache Kafka: Best Practices for Enterprise Lakehouse Architectures

In today's data-driven enterprises, the ability to efficiently stream change data capture (CDC) events from operational databases into analytical platforms has become a critical capability. This session explores the architectural patterns and operational best practices for building robust, scalable CDC pipelines that deliver data to Apache Iceberg using Apache Kafka as the streaming backbone. As organizations increasingly adopt lakehouse architectures for their analytical workloads, the challenge shifts from simply moving data to doing so optimally and at scale. This session provides practical guidance on setting up end-to-end CDC streaming pipelines, covering key considerations such as schema evolution handling, partition strategy optimization, compaction policies, and write performance tuning specific to Iceberg tables. Attendees will learn proven techniques for managing high-volume CDC streams, including strategies for handling late-arriving data, managing small file problems, optimizing merge operations, and implementing effective monitoring and alerting. We'll also discuss critical operational measures needed when scaling these pipelines to handle enterprise workloads, including resource allocation, backpressure management, and ensuring data consistency across distributed systems. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Vinayaka Gangadhar / Yashika Jain

Streaming CDC to Apache Iceberg at Scale with Apache Kafka: Best Practices for Enterprise Lakehouse Architectures

In today's data-driven enterprises, the ability to efficiently stream change data capture (CDC) events from operational databases into analytical platforms has become a critical capability. This session explores the architectural patterns and operational best practices for building robust, scalable CDC pipelines that deliver data to Apache Iceberg using Apache Kafka as the streaming backbone. As organizations increasingly adopt lakehouse architectures for their analytical workloads, the challenge shifts from simply moving data to doing so optimally and at scale. This session provides practical guidance on setting up end-to-end CDC streaming pipelines, covering key considerations such as schema evolution handling, partition strategy optimization, compaction policies, and write performance tuning specific to Iceberg tables. Attendees will learn proven techniques for managing high-volume CDC streams, including strategies for handling late-arriving data, managing small file problems, optimizing merge operations, and implementing effective monitoring and alerting. We'll also discuss critical operational measures needed when scaling these pipelines to handle enterprise workloads, including resource allocation, backpressure management, and ensuring data consistency across distributed systems. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Vinayaka Gangadhar

AWS

Yashika Jain

AWS

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

A peek under the hood of Confluent for VS Code

An ever growing number of developers discover the capabilities of data streaming platforms and apply them in their software projects. How can we make them more successful working with technologies, such as Apache Kafka or Apache Flink?Take a deep dive into Confluent's VS Code extension, which provides a delightful developer experience for data streaming projects. We’ll cover the main components of the extension’s architecture and discuss how they provide a seamless integration with Kafka and Flink. For instance, attendees will gain an insider's perspective on why we chose to run a GraalVM-powered sidecar executable and how it enhances our extension’s performance and capabilities. We will also share some of the notable challenges our development team encountered, and how we overcame them to deliver a robust user experience. Finally, we will highlight some of our newest features and improvements, and show how they support your data streaming projects from development through production.

Stefan Sprenger

A peek under the hood of Confluent for VS Code

An ever growing number of developers discover the capabilities of data streaming platforms and apply them in their software projects. How can we make them more successful working with technologies, such as Apache Kafka or Apache Flink?Take a deep dive into Confluent's VS Code extension, which provides a delightful developer experience for data streaming projects. We’ll cover the main components of the extension’s architecture and discuss how they provide a seamless integration with Kafka and Flink. For instance, attendees will gain an insider's perspective on why we chose to run a GraalVM-powered sidecar executable and how it enhances our extension’s performance and capabilities. We will also share some of the notable challenges our development team encountered, and how we overcame them to deliver a robust user experience. Finally, we will highlight some of our newest features and improvements, and show how they support your data streaming projects from development through production.

Stefan Sprenger

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Lightning Talk

Header-aware state stores for Kafka Streams

Kafka record headers are increasingly used to carry critical metadata such as schema identifiers, correlation IDs, tracing information, and feature flags. In modern event-driven architectures, these headers are also the primary vehicle for propagating distributed tracing context (trace IDs, span IDs, and causality metadata) across asynchronous boundaries. Today, Kafka Streams state stores ignore this metadata and only persist key and value bytes, which means any header-based semantics and tracing context are lost as soon as a record passes through a stateful operator and gets materialized. This makes it hard to use header-aware serdes consistently, breaks end-to-end traces at stateful boundaries, limits downstream processors that rely on headers to drive behavior, and prevents Interactive Queries from exposing header-level observability when headers are part of an application’s protocol. This talk introduces KIP‑1271, which proposes header-aware state stores for Kafka Streams as a building block for robust end-to-end tracing in Streams applications. We’ll look at the new header-preserving store types and how they embed serialized headers alongside values while keeping the existing key/value abstraction intact. You’ll see how applications can opt into these stores via the new *WithHeaders factories, how header-aware serdes plug in so that tracing and other metadata survive stateful operations and replays, and how upgrade is handled through a single rolling bounce with lazy migration of existing state. Attendees will leave with a clear understanding of when and how to adopt header-aware state stores to keep tracing context and other critical metadata intact end to end.

Alieh Saeedi

Header-aware state stores for Kafka Streams

Kafka record headers are increasingly used to carry critical metadata such as schema identifiers, correlation IDs, tracing information, and feature flags. In modern event-driven architectures, these headers are also the primary vehicle for propagating distributed tracing context (trace IDs, span IDs, and causality metadata) across asynchronous boundaries. Today, Kafka Streams state stores ignore this metadata and only persist key and value bytes, which means any header-based semantics and tracing context are lost as soon as a record passes through a stateful operator and gets materialized. This makes it hard to use header-aware serdes consistently, breaks end-to-end traces at stateful boundaries, limits downstream processors that rely on headers to drive behavior, and prevents Interactive Queries from exposing header-level observability when headers are part of an application’s protocol. This talk introduces KIP‑1271, which proposes header-aware state stores for Kafka Streams as a building block for robust end-to-end tracing in Streams applications. We’ll look at the new header-preserving store types and how they embed serialized headers alongside values while keeping the existing key/value abstraction intact. You’ll see how applications can opt into these stores via the new *WithHeaders factories, how header-aware serdes plug in so that tracing and other metadata survive stateful operations and replays, and how upgrade is handled through a single rolling bounce with lazy migration of existing state. Attendees will leave with a clear understanding of when and how to adopt header-aware state stores to keep tracing context and other critical metadata intact end to end.

Alieh Saeedi

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Lightning Talk

The GitHub for Streaming Data: Unlocking Open Data Streams

I am a developer. My day job is at IoT company Device Insight (we are known in the Kafka community for our open source tool kafkactl). On nights and weekends, I am working on my passion project, the Grand Central Message Broker (gcmb.io). This is a platform that is based on the following idea: Provide and consume streaming data in a collaborative manner. I made the following observation: In development, source code repositories used to be silos, only used in the context of a single company or project. With the advent of GitHub, things changed: There was now a space where collaboration could happen, code could be made available, searched for, re-used, developed together across individuals and organizations. You could argue that the streaming world is in a place where software development used to be. Data is created, handled and processed in silos. Which is fine, a lot of data is private to organizations and should not be shared. There is, however, streaming data that can be useful for a wider audience. This is primarily in the realm of Open Data. There is a lot of this around the world, however, most of it is in static datasets. My vision is to make Open Data available in a streaming manner. For this reason, I am building gcmb.io, a platform where you can easily share streams of Open Data and consume those provided by others. This makes it easy to combine different types of information and use them for data science or in applications. Examples for such data streams freely available on gcmb.io: 17 million airplane positions per day (ADS-B) from around the world A stream of Wikipedia edits (400k per day) Current energy data from various countries (energy production, consumption) Medium blog posts as they are published If you want to check out the project, it's live at https://gcmb.io. There you can find a list of featured projects (including the ones mentioned above) If given the chance to present, I would like to explain the general concept and how the data can be ingested into Kafka and Confluent Cloud (did I mention that gcmb .io has native Kafka integration?)

Stefan Hudelmaier

The GitHub for Streaming Data: Unlocking Open Data Streams

I am a developer. My day job is at IoT company Device Insight (we are known in the Kafka community for our open source tool kafkactl). On nights and weekends, I am working on my passion project, the Grand Central Message Broker (gcmb.io). This is a platform that is based on the following idea: Provide and consume streaming data in a collaborative manner. I made the following observation: In development, source code repositories used to be silos, only used in the context of a single company or project. With the advent of GitHub, things changed: There was now a space where collaboration could happen, code could be made available, searched for, re-used, developed together across individuals and organizations. You could argue that the streaming world is in a place where software development used to be. Data is created, handled and processed in silos. Which is fine, a lot of data is private to organizations and should not be shared. There is, however, streaming data that can be useful for a wider audience. This is primarily in the realm of Open Data. There is a lot of this around the world, however, most of it is in static datasets. My vision is to make Open Data available in a streaming manner. For this reason, I am building gcmb.io, a platform where you can easily share streams of Open Data and consume those provided by others. This makes it easy to combine different types of information and use them for data science or in applications. Examples for such data streams freely available on gcmb.io: 17 million airplane positions per day (ADS-B) from around the world A stream of Wikipedia edits (400k per day) Current energy data from various countries (energy production, consumption) Medium blog posts as they are published If you want to check out the project, it's live at https://gcmb.io. There you can find a list of featured projects (including the ones mentioned above) If given the chance to present, I would like to explain the general concept and how the data can be ingested into Kafka and Confluent Cloud (did I mention that gcmb .io has native Kafka integration?)

Stefan Hudelmaier

Device Insight GmBH

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

The "Plug & Play" Lie: Why Your Oracle CDC Pipeline Will Fail

Surviving unbounded numerics, missing SMTs, and XStream configuration hell. We are often promised that Change Data Capture (CDC) is a solved problem: "Just install the Debezium connector, point it at Oracle, and stream." In reality, connecting Debezium to Oracle XStream is not the end of the journey, it is merely the start of a complex engineering challenge. We will share why without a rigorous platform around it, a raw XStream implementation often leads to production outages, data corruption via type mismatches, and operational gaps. In this session, we will expose the missing pieces required to turn a raw Debezium connector into a resilient data pipeline. We will move beyond "Hello World" examples and dissect the painful realities of Oracle CDC. We will frame this discussion around our own architectural evolution: sharing the scars from our v1, the decisions that defined our current production v2, and the architectural features of a hypothetical v3 that we are still chasing. We’ll go deep on: The Type System Minefield: How to handle Oracle’s "Unbounded Numerics" and complex timestamps without crashing your consumers or losing precision (and why default SMTs aren't enough) Declarative Pipeline Generation: Why handwriting connector configs is a recipe for disaster. We will demonstrate using SpecMesh and pipeline definitions to auto-generate complex Debezium and SMT configurations. We use these definitions as collaborative contracts with domain teams, agreeing on schemas and intent upstream. Closing the Trust Gap: CDC without verification is just a best-guess. We will share an overview of our continuous reconciliation process that proves that the data matches the source of truth. Full-Stack Local Testing: Running Kubernetes, Oracle, Kafka, and your full pipeline on a developer machine. We’ll show how to test schema evolution and SMT logic locally, long before deployment Survival Mechanics: Deep dives into XStream position recovery, implementing heartbeat to prevent quiet tables holding on to redo logs for longer than expected and handling Confluent Cloud region failover without data loss This is a deeply technical, practitioner-focused session aimed at engineers and architects who are interested in migrating data from Oracle databases. You’ll come away with: a mental model of how Oracle XStream works, design patterns for building resilient pipelines, concrete tips for observability and performance tuning, and a set of “day two” operational checklists.

Kiril Piskunov / Declan Curran

The "Plug & Play" Lie: Why Your Oracle CDC Pipeline Will Fail

Surviving unbounded numerics, missing SMTs, and XStream configuration hell. We are often promised that Change Data Capture (CDC) is a solved problem: "Just install the Debezium connector, point it at Oracle, and stream." In reality, connecting Debezium to Oracle XStream is not the end of the journey, it is merely the start of a complex engineering challenge. We will share why without a rigorous platform around it, a raw XStream implementation often leads to production outages, data corruption via type mismatches, and operational gaps. In this session, we will expose the missing pieces required to turn a raw Debezium connector into a resilient data pipeline. We will move beyond "Hello World" examples and dissect the painful realities of Oracle CDC. We will frame this discussion around our own architectural evolution: sharing the scars from our v1, the decisions that defined our current production v2, and the architectural features of a hypothetical v3 that we are still chasing. We’ll go deep on: The Type System Minefield: How to handle Oracle’s "Unbounded Numerics" and complex timestamps without crashing your consumers or losing precision (and why default SMTs aren't enough) Declarative Pipeline Generation: Why handwriting connector configs is a recipe for disaster. We will demonstrate using SpecMesh and pipeline definitions to auto-generate complex Debezium and SMT configurations. We use these definitions as collaborative contracts with domain teams, agreeing on schemas and intent upstream. Closing the Trust Gap: CDC without verification is just a best-guess. We will share an overview of our continuous reconciliation process that proves that the data matches the source of truth. Full-Stack Local Testing: Running Kubernetes, Oracle, Kafka, and your full pipeline on a developer machine. We’ll show how to test schema evolution and SMT logic locally, long before deployment Survival Mechanics: Deep dives into XStream position recovery, implementing heartbeat to prevent quiet tables holding on to redo logs for longer than expected and handling Confluent Cloud region failover without data loss This is a deeply technical, practitioner-focused session aimed at engineers and architects who are interested in migrating data from Oracle databases. You’ll come away with: a mental model of how Oracle XStream works, design patterns for building resilient pipelines, concrete tips for observability and performance tuning, and a set of “day two” operational checklists.

Kiril Piskunov

MarketAxess

Declan Curran

MarketAxess

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Flink Beyond Streaming: Building a Production-Ready Batch Platform at LinkedIn

Apache Flink is widely known for streaming, but running Flink Batch as a reliable, repeatable “default” engine for critical offline workloads requires platform work that does not show up in typical examples. In this session, we will share how we productionized Flink Batch and Flink SQL for large batch pipelines-covering the engineering choices, operational guardrails, and lessons learned when scaling adoption at LinkedIn. We will start with the platform foundations needed to make batch SQL dependable in production: packaging and deployment patterns for batch SQL jobs, reducing configuration drift between job logic and orchestration, and the minimum observability you need to debug regressions quickly. Then we will go deep on concrete performance and scalability work: SQL query optimizations such as nested projection and filter pushdown to reduce compute and I/O. Remote shuffle with Celeborn to overcome shuffle bottlenecks and improve throughput predictability for the largest batch workloads. Workflow orchestration with Apache Airflow to schedule, monitor, and recover batch pipelines with minimal operator toil. Operational observability using Flink HistoryServer for post-job diagnostics and faster root-cause analysis. To make it real, the talk is anchored by two production “tales from the trenches”: Scalability and Reliability Training Data at Scale: Lessons from LinkedIn Ads We will walk through how we optimized a large machine learning model training data pipeline running on Flink Batch including changes in SQL planning, execution and shuffle architecture, and how these improvements enhanced runtime performance and operational stability. Also, will share before and after numbers to showcase the significant scaling improvements. Developer Experience and Maintainability Scaling Central Interaction Logging ingestion (online + offline)CIL is a central platform that provides a unified view of users' interactions across in online and offline environments so downstream systems - including AI models - can rely on a single consistent source. We will share the bottlenecks encountered when scaling onboarding to many near-identical SQL ingestion jobs: manual job/DAG scaffolding, fragile configuration wiring, schema-only testing, and recurring Avro/schema maintenance. Audience takeaways: a practical checklist for running Flink Batch at scale (query tuning, shuffle choices, orchestration, and observability), and patterns for onboarding many SQL jobs with less duplication, better testability, and safer schema/dependency evolution.

Archit Goyal

Flink Beyond Streaming: Building a Production-Ready Batch Platform at LinkedIn

Apache Flink is widely known for streaming, but running Flink Batch as a reliable, repeatable “default” engine for critical offline workloads requires platform work that does not show up in typical examples. In this session, we will share how we productionized Flink Batch and Flink SQL for large batch pipelines-covering the engineering choices, operational guardrails, and lessons learned when scaling adoption at LinkedIn. We will start with the platform foundations needed to make batch SQL dependable in production: packaging and deployment patterns for batch SQL jobs, reducing configuration drift between job logic and orchestration, and the minimum observability you need to debug regressions quickly. Then we will go deep on concrete performance and scalability work: SQL query optimizations such as nested projection and filter pushdown to reduce compute and I/O. Remote shuffle with Celeborn to overcome shuffle bottlenecks and improve throughput predictability for the largest batch workloads. Workflow orchestration with Apache Airflow to schedule, monitor, and recover batch pipelines with minimal operator toil. Operational observability using Flink HistoryServer for post-job diagnostics and faster root-cause analysis. To make it real, the talk is anchored by two production “tales from the trenches”: Scalability and Reliability Training Data at Scale: Lessons from LinkedIn Ads We will walk through how we optimized a large machine learning model training data pipeline running on Flink Batch including changes in SQL planning, execution and shuffle architecture, and how these improvements enhanced runtime performance and operational stability. Also, will share before and after numbers to showcase the significant scaling improvements. Developer Experience and Maintainability Scaling Central Interaction Logging ingestion (online + offline)CIL is a central platform that provides a unified view of users' interactions across in online and offline environments so downstream systems - including AI models - can rely on a single consistent source. We will share the bottlenecks encountered when scaling onboarding to many near-identical SQL ingestion jobs: manual job/DAG scaffolding, fragile configuration wiring, schema-only testing, and recurring Avro/schema maintenance. Audience takeaways: a practical checklist for running Flink Batch at scale (query tuning, shuffle choices, orchestration, and observability), and patterns for onboarding many SQL jobs with less duplication, better testability, and safer schema/dependency evolution.

Archit Goyal

This is some text inside of a div block.

,

This is some text inside of a div block.

May 19, 2026

Breakout Session

The High-Performance Backbone: Benchmarking Your Enterprise for Streaming Success

For decades, enterprise strategy has been throttled by the "Data Mess"—a brittle web of point-to-point integrations and slow batch processes that create systemic friction and stall innovation. For the modern tech executive, the challenge is no longer just managing data at rest, but enabling the High-Velocity Enterprise: an organisation capable of making intelligent, data-driven decisions in the moments that matter. This session introduces the Data Streaming Readiness Framework, a diagnostic, socio-technical methodology designed to move organisations beyond the "Data Mess". Based on the results of many sessions with technology leaders, we move past technical hype to address the core strategic pillars of streaming readiness: Value from Data, through a Unified Platform, guided by Purposeful Adoption. In this talk, we dive into the architectural shifts required to dissolve the wall between operational and analytical planes. We will discuss: Data as a Product: Transforming "just data" into intentionally shared, discoverable, and governed assets with clear ownership. The Internal Developer Platform (IDP): Providing a "golden path" for engineers that abstracts technical complexity and automates global guardrails. Decentralized Accountability: We shift from over-burdened central teams to a Platform Team while Business Domains take full ownership of their data. We share benchmarks and "Tales from the Trenches" from global leaders, including: How a Global Retailer consolidated 4 disparate estates to remove legacy integration debt and save six figure sums annually. How an Automotive Manufacturer institutionalised value governance, linking streaming outcomes directly to enterprise KPIs and P&L impact. Audience Takeaways: The Readiness Scorecard: A 9-category framework to baseline your enterprise’s data streaming maturity. Executive Metrics: Link streaming data to the P&L, including annual cost savings and a reduction in implementation costs. The "Day Zero" Blueprint: A step-by-step execution roadmap to navigate the transition from organic adoption to a unified enterprise backbone.

Jon McCullagh-Vines

The High-Performance Backbone: Benchmarking Your Enterprise for Streaming Success

For decades, enterprise strategy has been throttled by the "Data Mess"—a brittle web of point-to-point integrations and slow batch processes that create systemic friction and stall innovation. For the modern tech executive, the challenge is no longer just managing data at rest, but enabling the High-Velocity Enterprise: an organisation capable of making intelligent, data-driven decisions in the moments that matter. This session introduces the Data Streaming Readiness Framework, a diagnostic, socio-technical methodology designed to move organisations beyond the "Data Mess". Based on the results of many sessions with technology leaders, we move past technical hype to address the core strategic pillars of streaming readiness: Value from Data, through a Unified Platform, guided by Purposeful Adoption. In this talk, we dive into the architectural shifts required to dissolve the wall between operational and analytical planes. We will discuss: Data as a Product: Transforming "just data" into intentionally shared, discoverable, and governed assets with clear ownership. The Internal Developer Platform (IDP): Providing a "golden path" for engineers that abstracts technical complexity and automates global guardrails. Decentralized Accountability: We shift from over-burdened central teams to a Platform Team while Business Domains take full ownership of their data. We share benchmarks and "Tales from the Trenches" from global leaders, including: How a Global Retailer consolidated 4 disparate estates to remove legacy integration debt and save six figure sums annually. How an Automotive Manufacturer institutionalised value governance, linking streaming outcomes directly to enterprise KPIs and P&L impact. Audience Takeaways: The Readiness Scorecard: A 9-category framework to baseline your enterprise’s data streaming maturity. Executive Metrics: Link streaming data to the P&L, including annual cost savings and a reduction in implementation costs. The "Day Zero" Blueprint: A step-by-step execution roadmap to navigate the transition from organic adoption to a unified enterprise backbone.

Jon McCullagh-Vines

Confluent

This is some text inside of a div block.

,

This is some text inside of a div block.

May 20, 2026

Breakout Session

Batch Is Just a Slow Stream: Designing Event-First Pipelines Without Going All-In on Real-Time

Most data platforms still think in batches. Daily jobs, hourly micro-batches, and carefully tuned schedules dominate, even in organizations already running Kafka. The result is familiar: complex backfills, fragile dependencies, hard-to-debug pipelines, and endless debates about whether “real-time” is worth the cost and complexity. This talk argues that the real problem isn’t batch versus streaming, it’s the mindset behind them. Batch and streaming are not fundamentally different architectures. They are the same model operating at different speeds. When teams design pipelines around intervals instead of events, they lock themselves into unnecessary complexity and make future change expensive. By contrast, designing workloads as streams first allows processing speed to become a configuration choice rather than an architectural constraint. In this session, we explore how to transition batch workloads by shifting to stream-first, event-based thinking without committing to always-on, low-latency systems on day one. We’ll show how to model data as a sequence of events, reason about state and correctness over time, and decouple business logic from scheduling. From there, the same pipeline can safely run daily, hourly, or continuously, depending on cost, operational maturity, and business value. We’ll also discuss when event-driven architecture naturally emerges as the simplest solution not because “real-time” is a goal, but because making change explicit removes the need for artificial intervals. Backfills become replays, late data becomes a first-class concern, and debugging shifts from job-centric to time-centric reasoning. Attendees will leave with a practical mental model for evolving batch pipelines using Kafka, guidance on choosing processing speed deliberately, and a clear path toward event-based systems that scales with their organization not against it.

Ramzi Alashabi

Batch Is Just a Slow Stream: Designing Event-First Pipelines Without Going All-In on Real-Time

Most data platforms still think in batches. Daily jobs, hourly micro-batches, and carefully tuned schedules dominate, even in organizations already running Kafka. The result is familiar: complex backfills, fragile dependencies, hard-to-debug pipelines, and endless debates about whether “real-time” is worth the cost and complexity. This talk argues that the real problem isn’t batch versus streaming, it’s the mindset behind them. Batch and streaming are not fundamentally different architectures. They are the same model operating at different speeds. When teams design pipelines around intervals instead of events, they lock themselves into unnecessary complexity and make future change expensive. By contrast, designing workloads as streams first allows processing speed to become a configuration choice rather than an architectural constraint. In this session, we explore how to transition batch workloads by shifting to stream-first, event-based thinking without committing to always-on, low-latency systems on day one. We’ll show how to model data as a sequence of events, reason about state and correctness over time, and decouple business logic from scheduling. From there, the same pipeline can safely run daily, hourly, or continuously, depending on cost, operational maturity, and business value. We’ll also discuss when event-driven architecture naturally emerges as the simplest solution not because “real-time” is a goal, but because making change explicit removes the need for artificial intervals. Backfills become replays, late data becomes a first-class concern, and debugging shifts from job-centric to time-centric reasoning. Attendees will leave with a practical mental model for evolving batch pipelines using Kafka, guidance on choosing processing speed deliberately, and a clear path toward event-based systems that scales with their organization not against it.

Ramzi Alashabi

ABN AMRO Bank N.V.

This is some text inside of a div block.

,

This is some text inside of a div block.

Current New London 2026

Session Archive

Building Reliable CDC and Kafka Mirroring Pipelines at Trillion-Message Scale

From Batch to Real Time: Operating Cassandra CDC with Debezium at Datadog Scale

Testing Flink SQL Scripts Made Simple for Non-Developers

Deep dive into writing Queues for Kafka applications

Back to the Boring: GenAI That Ships

Towards Interoperable Intelligence: Streaming Foundations for Multi‑Agent Systems

🤖 Building AI systems? Context - and Flink - is all you need!

From Data Pipelines to Context Streams: Building Infrastructure for the Agent Era

Kafka Head-of-Line Blocking: Increase Throughput, Reduce Latency

Stop Answering Today's Questions with Yesterday's Data: Low-Latency RAG with Kafka and Flink

What If We've Been Scaling Stream Processing Wrong All Along?

Beyond Watermarks: Custom Flink Operators for Feature-Trigger Synchronization

Debezium, Apache Kafka, and an Acyclic Synchronization Algorithm

Transactional Change Stream Processing With Apache Flink

Thinking in Streams: Building Stateful, Serverless Agentic Loops

Python Streaming Analytics Leveraging the Composable Data Stack

Embedding Tiny Language Models in Flink SQL functions

Joining Streams Through Time: As-Of Joins with Spark 4's new transformWithState API

Migrating a Large-Scale Kafka Streams Platform to the KIP-1071 Rebalance Protocol

Scaling Real-Time AI Actions with Amazon Bedrock AgentCore and Confluent Streaming Agents

AI Needs Context. Why Flink is Made for Context Engineering

Supersonic Streams: When Quarkus Met Kafka

How to write your own partition assignor in Kafka’s KIP-848 Era

Life as a Kafka Admin: Lessons from Running 30+ Clusters in Production

Design with me : a Kafka Streams Payment Authorization Collaborative Design Session

Schema Management in Kafka (with GitOps!)

Your Model Is Fine. Your Context Is Broken.

Diskless but with disks, Leaderless but with leaders: A KIP-1163 Deep Dive

Real-Time Feature Engineering at Scale: Chaining Features and Inference with Chronon

Lambda Architecture in 2025: Kafka, Views, and the Evolving Data Platform

From Weeks to Seconds: Real-Time ML Quality Control for Medical Device Manufacturing

Building Intelligent Systems on Real Time Data

Handling Surges in Petabyte-Scale Streaming Systems by Doing Nothing

Dynamic Kafka, Static Sleep: Taming Multi-Cluster Streams with Flink at OpenAI

Turning the database inside out again: What if everything was Iceberg?

From Blind Spots to Full Visibility: Kafka Observability with OpenTelemetry

Streamiz: Bringing Native Kafka Streams to the .NET Ecosystem

Who Let the Agent In? Securing MCP Servers in Production

The Missing Piece in the Kafka Stack: Durable Functions for Event-Driven Apps and AI Agents

Breaking Kafka at Scale: Lessons from Running 70K Topics on a Single Cluster

Defending the Perimeter: Patterns for Secure External Event Exchange

A Hitchhiker’s Guide to Apache Kafka Data Migrations

Distilling Kafka’s Binary Protocol into Elixir

Sizing, Benchmarking and Performance Tuning Apache Flink Clusters

Streaming AI/ML with Apache Kafka: Real-Time Patterns for Modern Intelligence

Life in the Slow Lane: Cost-Efficient Streaming Through Latency Tiering

Chaos to Golden Path - How FanDuel's Eventing Strategy Transformed Enterprise Event Streaming

Bridging Stream and Queue: Protocol Enhancements For Kafka's Share Groups

Enterprise ready with the Flink HTTP Connector

How Datadog Runs Its Streaming Platform

Getting Started with Apache Flink: Essential Patterns and Best Practices

Keeping data private in real-time pipelines

Streaming CDC to Apache Iceberg at Scale with Apache Kafka: Best Practices for Enterprise Lakehouse Architectures

A peek under the hood of Confluent for VS Code

Header-aware state stores for Kafka Streams

The GitHub for Streaming Data: Unlocking Open Data Streams

The "Plug & Play" Lie: Why Your Oracle CDC Pipeline Will Fail

Flink Beyond Streaming: Building a Production-Ready Batch Platform at LinkedIn

The High-Performance Backbone: Benchmarking Your Enterprise for Streaming Success

Batch Is Just a Slow Stream: Designing Event-First Pipelines Without Going All-In on Real-Time