Current New Orleans 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.
Keynote: Building Intelligent Systems on Real-time Data
Join Jay Kreps, Confluent leadership, our customers, and industry thought leaders to learn how you can build intelligent systems with real-time data. We’ll show you why streaming is becoming ubiquitous across the business—and how that unlocks a shift-left approach: process and govern at the source, then reuse everywhere. Expect live demos and candid customer stories that make it concrete. Whether you’re a data leader, architect, or builder, you’ll leave with practical playbooks for bringing real-time AI to production. The future is here-let's ignite it together!
Jay Kreps / Shaun Clowes / Sean Falconer / Rajesh Kandasamy / Cosmo Wolfe / Rachel Lo / Gunther Hagleitner
Unlocking Inter-Agent Collaboration: Confluent Powers Scalable AI with Google Cloud's A2A
The next frontier in AI is intelligent agentic systems, where agents collaborate to achieve complex goals. Google Cloud's Agent2Agent (A2A) protocol offers a crucial open standard for this inter-agent communication, enabling discovery, coordination, and task execution. However, scaling these multi-agent systems in real-world enterprise environments demands a robust, asynchronous, and resilient communication backbone. This is precisely where Confluent, powered by Apache Kafka and Apache Flink, becomes indispensable. This session will explore the powerful synergy between Confluent Cloud and Google Cloud's A2A protocol. We'll delve into architectural patterns leveraging Kafka topics as the shared, real-time central nervous system for A2A message exchange, ensuring unparalleled scalability and decoupling. Attendees will learn how Confluent's fully managed services, including Apache Flink and comprehensive connectors, facilitate seamless data flow, real-time processing, and contextual enrichment of agent communications, enabling consistency and integrity. This makes agent interactions inherently trackable, shareable, and composable across your systems. Discover through practical use cases how Confluent's platform capabilities empower AI agents for intelligent automation and dynamic orchestration within the Google Cloud environment. This session will demonstrate why Confluent is the foundational platform for building truly scalable, integrated, and reliable AI ecosystems with Google Cloud Agent2Agent.
Dustin Shammo / Merlin Yamssi / Pascal Vantrepote
Change Data Capture at Scale: Insights from Slack’s Streaming Pipeline
Slack was burning cash on batch data replication, with full-table restores causing multi-day latency. To slash both costs and lag, we overhauled our data infrastructure—replacing batch jobs with Change Data Capture (CDC) streams powered by Debezium, Vitess, and Kafka Connect. We scaled to thousands of shards and streamed petabytes of data. This talk focuses on the open source contributions we made to build scalable, maintainable, and reliable CDC infrastructure at Slack.We'll show how we cut snapshotting time—from weeks to hours—for our largest table, half a petabyte in size and spread across hundreds of shards. You’ll learn how to apply our optimizations in Debezium, tune Kafka Connect configs, and maximize throughput. We’ll also cover how we tackled one of streaming’s most elusive challenges: detecting accurate time windows. By contributing a binlog event watermarking system to Vitess and Debezium, we made it possible to ensure correctness in a distributed system with variable lag. Finally, we’ll show you how to detect & prevent data loss in your own pipelines by applying the fixes we contributed to Kafka Connect and Debezium, which addressed subtle edge cases we uncovered in these systems.Attendees will leave with practical techniques for deploying, scaling, and maintaining reliable CDC pipelines using open source tools—and a deeper understanding of how to avoid the common (and costly) pitfalls that can hinder the success of streaming data pipelines.
Tom Thornton
Unifying Kafka and Relational Databases for Event Streaming Applications
Kafka and relational databases have long been part of event-driven architectures and streaming applications. However, Kafka topics and database tables have historically been separate abstractions with independent storage and transaction mechanisms. Making them work together seamlessly can be challenging, especially because queuing has been viewed as an anti-pattern in a stock database.This talk will describe how to close this gap by providing a customized queuing abstraction inside the database that can be accessed via both SQL and Kafka’s Java APIs. Since topics are directly supported by the database engine, applications can easily leverage ACID properties of local database transactions allowing exactly-once event processing. Patterns such as Transactional Outbox (writing a data value and sending an event) or any atomicity required across many discrete database and streaming operations can be supported out of the box. In addition, the full power of SQL queries can be used to view records in topics and also to join records in topics with rows in database tables.In this talk we cover the synergy between Kafka's Java APIs, SQL, and the transactional capabilities of the Oracle Database. We describe the implementation, which uses a transactional event queue (TxEventQ) to implement a Kafka topic and a modified Kafka client that provides a single, unified JDBC connection to the database for event processing and traditional database access.
Nithin Thekkupadam Narayanan
What the Spec?!: New Features in Apache Iceberg™ Table Format V3
Apache Iceberg™ made great advancements going from Table Format V1 to Table Format V2, introducing features like position deletes, advanced metrics, and cleaner metadata abstractions. But with Table Format V3 on the horizon, Iceberg users have even more to look forward to.In this session, we’ll explore some of the exciting new user-facing features that V3 Iceberg is about to introduce and see how they’ll make working with Open Data Formats easier than ever! We’ll go through the high-level details of the new functionality that will be available in V3. Then we’ll dive deep into some of the most impactful features. You’ll learn what Variant types have to offer your semi-structured data, how Row Lineage can enhance CDC capabilities, and more. The community has come together to build yet another great release of the Iceberg spec, so attend and learn about all of the changes coming and how you can take advantage of them in your teams.
Russell Spitzer
Agentic AI Meets Kafka + Flink: Event-Driven Orchestration for Multi-Agent Systems
The rise of agentic AI systems is reshaping how intelligent applications are architected—introducing new levels of autonomy, collaboration, and complexity. As protocols like Agent-to-Agent (A2A) and Model Context Protocol (MCP) become foundational for multi-agent orchestration, the need for robust, scalable, and event-driven infrastructure becomes mission-critical. In this session, we’ll explore how Apache Kafka and Apache Flink serve as the backbone for enabling multi-agent systems to operate reliably and responsively in high-throughput environments. From maintaining contextual memory to routing asynchronous and synchronous requests, we’ll break down the architectural patterns that support real-time, protocol-compliant agent communication at scale. You’ll learn how to stream and process multimodal data - from gRPC to REST to JSON payloads - across distributed agent workflows while enforcing data integrity, managing quotas, and maintaining full observability. We’ll also cover how to apply stateful stream processing and fine-grained filtering to ensure agents always act on timely, relevant, and high-quality information. Key Takeaways:- How to integrate Kafka and Flink with agentic AI frameworks using A2A and MCP- Designing asynchronous/synchronous agent workflows with low-latency pipelines- Techniques for streaming multimodal data between AI agents and services- Enabling quota enforcement, usage tracking, and cost visibility with Kafka/Flink- Real-world lessons from building distributed, multi-agent AI systems in productionIf you're working on next-gen AI systems that require context awareness, memory, coordination, and streaming intelligence - this session will show you how to make it real.
Israel Ekpo / Devanshi Thakar (GPS)
An ounce of prevention is worth a pound of cure - Fix data clustering in streaming write to Iceberg
Apache Flink is commonly used to ingest continuous streams of data to Apache Iceberg tables. But it lacks the ability to organize data at write time, which can lead to small files and poor data clustering problems for many use cases. Regular table maintenance, such as compaction and sorting, can help to remediate the problems. But prevention is usually cheaper than remediation. In this talk, we will present a solution that can prevent those problems during streaming ingestion. Range distribution (sorting) is a common technique for data clustering, and many batch engines support it when writing to Iceberg. We will describe the range partitioner that was contributed to Flink Iceberg sink (released in Iceberg 1.7 from late 2024). We will deep dive into how to handle the challenges of unbounded streams, organically evolving traffic patterns, low-cardinality and high-cardinality sort columns, and rescaling writer parallelism. By the end of the session, you will understand the design choices and tradeoffs, and why it is applicable to broad use cases of streaming ingestion.
Steven Wu
Agents Running A Data Mesh
The exponential growth of data demands a paradigm shift in how we discover, create, and evolve data products. This presentation introduces a novel Agentic Data Mesh architecture where AI agents take on proactive roles in the data product development lifecycle. Unlike traditional approaches, our agents don't just process data; they think about it.We envision a system where specialized AI agents, empowered by real-time streaming data from Confluent, proactively identify opportunities for new data products based on business needs, data patterns, and existing data assets. These "Discovery Agents" will propose data product definitions to human subject matter experts for approval, acting as intelligent co-creators.Upon approval, "Creation Agents" will leverage Confluent Tableflow to seamlessly transform Kafka topics into managed Apache Iceberg tables, ensuring schema evolution, ACID compliance, and time-travel capabilities. This automated creation extends to higher-order data products, where agents autonomously combine, refine, and process existing topics with Apache Flink, continually enriching the data landscape.Furthermore, "Analysis Agents" will exploit the robust capabilities of Iceberg tables, performing complex analytical queries and identifying new insights that trigger the creation of even more refined data products. This iterative, agent-driven feedback loop creates an invaluable, self-optimizing data product development lifecycle, minimizing manual intervention and accelerating time-to-insight.Attendees will learn:- The architecture of an agentic data mesh, integrating AI with Confluent's streaming platform and Apache Iceberg.- How AI agents can autonomously propose, create, and refine data products.- Best practices for leveraging Confluent Tableflow for seamless Kafka-to-Iceberg integration in an agentic system.- Strategies for establishing a continuous, self-improving data product development lifecycle.- Real-world implications and potential impact on data governance, data quality, and business agility.
Blake Shaw
Bite Size Topologies: Learning Kafka Streams Concepts One Topology at a Time
Event streaming with Kafka Streams is powerful, but can feel overwhelming to understand and implement. Breaking down advanced concepts into smaller, single-purpose topologies makes learning more approachable. Kafka Streams concepts will be introduced with an interactive web application that allows you to visualize input topics, output topics, changelog topics, state stores, and more. What happens when state store caching is disabled? What if topology optimization is enabled? Or what if stream time isn't advanced? These questions will be easily explored by visualizing the topology and Kafka Streams configurations.This interactive tutorial's real-time events are generated by actual data on your laptop, including running processes, thread details, windows, services, and user sessions. Moving a window on your laptop can trigger many examples, allowing you to see how the topology handles them.The audience will select which topologies to cover in categories of: flow, joins, windowing, advanced state storage usage, and more.Join me on this journey of learning Kafka Streams. You'll deepen your understanding of Kafka Streams concepts and gain access to tools that let you explore advanced concepts independently. All examples and visualization will be available in an open-source project.
Neil Buesing
Deep Dive into Apache Flink 2.1: The Key Features in SQL & AI Integration
Flink 2.1 is the first release following the major Flink 2.0, introducing significant advancements in SQL and AI integration, feature enhancements, and performance optimization. This session will highlight the following features:1. Seamless integration of Flink SQL and AI Model, exploring how to accomplish real-time AI analysis with Flink.2. Flink SQL supports the Variant type to improve the efficiency of real-time analysis of semi-structured data in Lakehouse.3. How Flink SQL addresses the performance bottleneck of multi-streaming join cases, including the introduction of various streaming optimization algorithms, such as delta join and multi-way join, ensuring efficient and scalable stream processing for modern data pipelines. 4. Flink integrates with Lance AI Format, which gives Flink the ability to handle multimodal data, and opens a new journey of AI workload.We hope that attendees will take away something from this session and let Apache Flink help you grow your business!
Ron Liu
Sizing, Benchmarking and Performance Tuning Apache Flink Clusters
A common question when adopting Apache Flink is about sizing the workload: How many CPUs, how much memory will Flink require for a particular use-case? What throughput and latency can you expect given your hardware?We’ll kick off this talk discussing why these questions are extremely difficult to answer for a generic stream processing framework like Flink. But we won’t stop there. The best approach to answer sizing questions is to benchmark your Flink workload. We will present how we’ve set up a Flink SQL-based benchmarking environment and some benchmarking results for attendees to correlate our results with their workloads to approximate their resource requirements.Naturally when benchmarking, the topic of performance tuning comes up: Are you optimally using the allocated resources? How to identify performance bottlenecks? What are the most common performance issues, and how to resolve them? In our case, a few configuration changes improved the throughput from 230mb/s to over 3200mb/s. How many CPU cores are needed for that in Flink? Attend the talk to find out, it's less than you would expect.This talk is for both Flink beginners wanting to get an idea about Flink’s performance and operational behavior, as well as for advanced users looking for best practices to improve performance and efficiency.
Robert Metzger
More than query: Morel, SQL and the evolution of data languages
What is the difference between a query language and a general-purpose programming language? Can SQL be extended to support streaming, incremental computation, data engineering, and general-purpose programming? How well does SQL fit into a modern software engineering workflow, with Git-based version control, CI, refactoring, and AI-assisted coding?These are all questions that drove the creation of Morel. Morel is a new functional programming language that embeds relational algebra, so it is as powerful as SQL. Morel's compiler, like that of any good SQL planner, generates scalable distributed programs, including federated SQL. But unlike SQL, Morel is Turing-complete, which means that you can solve the whole problem without leaving Morel.This session will discuss the challenges and opportunities of query languages, especially for streaming and data engineering tasks, and provide a gentle introduction to the Morel language. It is presented by Morel's creator, Julian Hyde, who created Apache Calcite and also pioneered streaming SQL.
Julian Hyde
The Future of Agentic AI is Event-Driven: How to Build Streaming Agents on Apache Flink
At their core, AI agents are microservices with a brain. They're powered by large language models (LLMs) and are independent, specialized, and designed to operate autonomously.But agents need more than LLMs to scale -- they need real-time data access, context, and the ability to collaborate across tools, services, and even other agents. As timely data becomes crucial for modern AI systems, agents must operate within distributed, event-driven environments. This session focuses on how to bridge the gap between streaming infrastructure and agentic architectures, by enabling developers to build, test, and operate agents natively on Flink. Through architecture diagrams, use cases and a demo, we'll show practical steps for getting started with streaming agents to power new automation workflows.
Sean Falconer / Mayank Juneja
Quiet Failures, Loud Consequences: Streaming ML Drift Detection in Practice
A machine learning model in production is like a ship sailing blind, everything looks fine until it slams into a reef. And by then, it's too late.This phenomenon, known as concept and model drift, is especially dangerous in real-time systems where decisions happen in milliseconds and rollback is usually not an option.If not detected early, drift doesn’t just break your models — it misprices loans, misses fraud, and even risks lives.This talk distills cutting-edge research and real production lessons into practical tools that can be apply today, even if the models are already in the wild. Based on ongoing PhD research and real-world implementations, we’ll walk through the following real live questions:- How drift manifests in event-driven ML systems — and why traditional batch monitoring fails.- Common algorithms for drift detection (i.e. DDM, EDDM, ADWIN, Page-Hinkley) — and how to benchmark them in streaming environments.- An architecture for integrating drift-aware intelligence into Flink pipelines, with hooks for alerting, model retraining, or failover strategies.- Lessons from production use cases, including trade-offs in detection latency, false positives, and system overhead.Whether you're deploying ML models into dynamic data streams or just planning your streaming AI strategy, you'll leave with a blueprint for building drift-resilient ML pipelines — plus hands-on knowledge to detect, benchmark, and respond to drift before it becomes failure.
Dominique Ronde
The evolution of Notion’s event logging stack
Notion's client and server applications generate billions of events daily. We track these events to understand how our customers are using the product and what kind of performance they experience. Some events also contribute to customer-facing product features. This talk covers the Event Trail platform that enables us to process and route these events in a scalable manner.Event logging at Notion was initially built on third-party services with connectors to our Snowflake data lake. This lacked the scalability and flexibility that we required as our product grew, and so we built Event Trail.Event Trail receives events from the application, augments their content, and then directs them to one or more destinations based on their type. Routing is defined in code with dynamic overrides and honoring of cookie permissions. The most common destinations are Apache Kafka topics powered by Confluent.The data warehouse ingestion pipelines read events from Kafka and write them to Snowflake. They were originally based on Apache Flink and S3 ingestion but have evolved to use Snowpipe Streaming connectors for easier maintenance and scalability. The real-time analytics pipelines use events to power user-facing features like page counters and enterprise workspace analytics. These features have also evolved, from batch results served via DynamoDB and Redis to online calculations via Apache Pinot.
Adam Hudson
Breaking Boundaries: Confluent Migration for Every Stack
In this topic, we will cover the current pain points our clients have experienced and the need for migration to Confluent Cloud. We will highlight Infosys experience in migrating clients from open-source platforms and other Kafka distributions, as well as message brokers, to Confluent Cloud.The migration approach will include:- Setting up Kafka clusters in Confluent Cloud using automation- Replicating topics and data using Confluent’s recommended methods- Seamlessly migrating clients from existing clusters to Confluent Cloud
Prakash Rajbhoj
Evolving the Data Supply Chain: Powering Real-Time Analytics & GenAI with Flink, Iceberg, and Trino
In the rapidly evolving landscape of data-driven enterprises, the ability to harness and process vast amounts of information in real-time is paramount. This talk will revisit the concept of the Data Supply Chain, a framework that enables AI at an enterprise scale, and explore cutting-edge technologies that are transforming data streaming and processing capabilities.Building on insights from last year's presentation (https://www.youtube.com/watch?v=Zp86b_eaW8g), we will delve into the use of Apache Flink for stream processing, providing a foundation for real-time decision-making and AI applications.We will introduce Tableflow, a tool for Kafka-to-Iceberg materialization. This integration enhances data accessibility, ensuring that data is readily available for analytics and AI workloads.The talk will also highlight the role of Starburst Trino in enabling real-time analytics and agentic workloads over Iceberg tables. By leveraging Trino's powerful uniform data access layer (query engine), enterprises can perform complex analytics on large datasets with unprecedented speed and efficiency. This capability is crucial for organizations aiming to derive actionable insights and drive innovation through AI.Join us as we explore these transformative technologies and their impact on the Data Supply Chain. Attendees will gain valuable insights into optimizing their data infrastructure to support AI initiatives and achieve enterprise-scale success. This session is ideal for data & AI strategists, data engineers, architects, and decision-makers looking to enhance their data streaming capabilities and unlock the full potential of AI in their organizations.
Dylan Gunther / Craig Albritton / Thomas Mahaffey
Diskless but with disks, Leaderless but with leaders: A KIP-1163 Deep Dive
KIP-1150: Diskless Topics promises to make Apache Kafka more cost effective and flexible than ever before, but how does it work? Where does the cost savings come from? Is it really Diskless? What about Leaderless? Why is the latency worse? This talk will walk through the design for the preferred implementation in KIP-1163: Diskless Core, and answer all of these questions.A basic understanding of Apache Kafka is enough to attend this talk: we’ll review the architecture used for classic and tiered topics, and how data is produced and fetched. We'll discuss the limitations of this architecture in the context of modern hyperscaler cloud deployments, and where the costs become excessive. Then we’ll show how the basic components of Kafka are taken apart and reassembled to build the Diskless architecture. We’ll also discuss the major rejected alternatives, and compare KIP-1163 to similar KIPs working to solve the same problem. At the end of this session, you should feel confident talking to stakeholders and community members about this amazing upcoming feature!
Greg Harris
Unpacking Serialization in Apache Kafka: Down the Rabbit Hole
Picture this: your Kafka application is humming along perfectly in development, but in production, throughput tanks and latency spikes. The culprit? That "simple" serialization choice you made without much thought. What seemed like a minor technical detail just became your biggest bottleneck.Every Kafka record—whether flowing through KafkaProducer, KafkaConsumer, Streams, or Connect—must be converted to bytes over TCP connections. This serialization step occupies a tiny footprint in your code but wields outsized influence over your application's performance. For Kafka Streams stateful operations, this impact multiplies as records serialize and deserialize on every state store access.You could grab a serializer that ships with Kafka and call it done. But depending on your data structure and use patterns, the wrong choice can cost you critical performance. The right choice can transform your application from sluggish to lightning-fast.This talk dives deep into serialization performance comparisons across different scenarios. We'll explore critical trade-offs: the governance and evolution benefits of Schema Registry versus the raw speed of high-performance serializers. You'll see real benchmarks, understand format internals, and learn exactly when to apply each approach.Whether you're building low-latency trading systems or high-throughput data pipelines, you'll leave with concrete knowledge to optimize one of Kafka's most impactful—yet overlooked—components. Don't let serialization be your silent performance killer.
Bill Bejeck
From Tower of Babel to Babel Fish: Evolving Your Kafka Architecture With Schema Registry
You’ve conquered the basics – Kafka clusters are running, producers are producing, and consumers are consuming. Life is good...until your Python team needs to talk to your Spring Boot services, and suddenly, everyone’s speaking different languages. Like the biblical Tower of Babel, your elegant event-driven architecture crumbles under the weight of miscommunication.What if there was a Babel Fish for your distributed systems? A way to let each service speak its native tongue while ensuring perfect understanding across your entire ecosystem?This talk will explore how Schema Registry transforms from “that optional component you skipped” into the essential backbone of resilient, polyglot Kafka architectures. You’ll discover practical strategies for implementing data contracts that evolve without breaking, patterns for seamlessly integrating Schema Registry into your CI/CD pipelines, and real-world approaches for managing schema evolution without derailing your development velocity.Whether scaling beyond your first language, preparing for a multi-team Kafka implementation, or recovering from your first production schema disaster, you’ll leave with concrete techniques to make your Kafka systems more resilient, flexible, and ready for Day 2 challenges.
Viktor Gamov
Tuning the Iceberg: Practical Strategies for Optimizing Tables and Queries
Apache Iceberg unlocks scalable, open table formats for the modern data lake—but performance doesn’t come by default. In this talk, we’ll dive into the hands-on techniques and architectural patterns that ensure your Iceberg tables and queries stay lean and lightning-fast. From data compaction and clustering to compression strategies and caching layers, we’ll explore how each lever impacts performance, cost, and query latency. You’ll also learn how modern engines like Dremio optimize queries behind the scenes and how to align your table design with those optimizations. Whether you’re running Iceberg in the cloud or on-prem, this session will give you a practical performance toolkit to get the most from your lakehouse architecture.
Alex Merced
Press Play on Data: Netflix's Journey from Streams to Gaming Insights
Netflix's Data Mesh platform serves as our foundation for stream processing, but recent innovations have dramatically expanded its capabilities and accessibility. This presentation explores how these advancements in Data Mesh enabled the successful development of our Games Analytics Platform, as Netflix's games portfolio expanded to 100+ games across TV, mobile, and web platforms.We'll first trace Data Mesh's evolution from a simple data movement platform to a comprehensive real-time processing ecosystem. Attendees will learn how the platform powers business-critical applications while maintaining security and scalability. A key advancement we'll highlight is the introduction of Streaming SQL, which replaced complex low-level programming with an intuitive, declarative approach. This evolution, alongside robust infrastructure-as-code practices, has democratized streaming data access across Netflix, enabling domain experts to build sophisticated data products without specialized stream processing knowledge.The second part of our presentation showcases these innovations in action through the Games Analytics Platform case study. As Netflix ventured into games, our Games Data team leveraged Data Mesh to build a robust data processing layer that helps scale their data teams to meet the diverse data needs of game stakeholders. We’ll demonstrate how the SQL Processor’s user-friendly features coupled with Infrastructure as Code capabilities within Data Mesh enabled Netflix Games to scale their data and analytics ecosystem with minimal technical overhead. Join us to discover how established data infrastructure can evolve to meet new business challenges, the architectural decisions that facilitated this evolution, and how the synergy between platform innovation and practical application resulted in a scalable data ecosystem supporting Netflix's growing gaming portfolio.
Sujay Jain / Michael Cuthbert
Event Driven Views With Ingest-Materialize-Index Stream Topology
At Indeed, we help people get jobs.Our team supports this mission by ingesting data about employers’ hiring needs, enabling them to manage this data, and transforming it into searchable job advertisements—commonly known as job posts. Creating job posts is a complex and I/O-bound process, requiring enrichment from multiple bounded contexts. Adding to this complexity, many downstream systems must be notified in real-time when any part of a job post’s data changes.To meet these demands, we implemented a system that produces and maintains an Event Driven View (EDV)—a materialized, denormalized representation of job posts that stays up-to-date as changes occur across the business. This view is powered by a novel Ingest–Materialize–Index (IMI) stream topology, which enables us to scale processing while preserving strong observability and reliability guarantees.This talk will explore how we build and operate EDVs using the IMI architecture: - We’ll break down each IMI stage and show how clean separation of concerns enables performance and observability. - We’ll show how we use micro-batching, structured concurrency, and I/O pipelining to manage I/O-bound enrichment at scale. - We’ll share strategies for recycling failed view materializations through rate-limited retry streams. - We’ll cover how we incorporate Change Data Capture (CDC) to reliably notify our downstream clients about changes in our persistent EDVs.
Sage Pierce
Empowering the Disconnected Edge: Shifting Far Left with Predictive Analytics for Naval Ships
A Navy ship is essentially a large edge node with unique complexities…let me explain. While you may not think of a ship as an edge node due to its size, it does share similar use cases that are seen on typical edge-based deployments. Sensor data is collected and needs to be aggregated and disseminated to multiple environments including shore and cloud sites. Sharing data in a denied, disrupted, intermittent, and limited (DDIL) environment presents a significant challenge. A Navy ship, when deployed, can also spend 6+ months out at sea before returning to port. For predictive analytics at the disconnected edge, a key consideration is how to manage software updates, including updates to the analytical models themselves.In this session, we will explore how Confluent (Kafka) and Databricks are solving the problems with predictive analytics at the edge and bridging the operational and analytical domains. We will demonstrate how Cluster Linking can be leveraged with DDIL and smart edge processing by prioritizing topics when bandwidth is restricted. We will use logistics data to develop analytics using Delta Live Tables and mlflow that can be used for predicting failures in equipment on the ship. And finally, how the analytics can be deployed to the ship, while at sea, for real-time reporting using Apache Flink.Attendees will leave with understanding of the complexities of edge-based analytics and a blueprint for setting up a pipeline to overcome those challenges in real-world applications.
Michael Peacock / Andrew Hahn
Future of Streaming: Emerging trends for event driven architectures
JPMC is undertaking a significant data transformation by implementing a next-generation data streaming platform, moving beyond traditional mainframe dependencies. This initiative addresses several challenges, including the expense of mainframe queries, excessive data duplication, silos, high data gravity within the mainframe, and a lack of real-time capabilities that have prevented effective data leverage for critical initiatives like Agentic AI.The strategy involves establishing Kafka as the authoritative copy of data, which facilitates the creation of a centralized source of truth. This approach enables the development of real-time data products that aim for high quality, availability, and global accessibility. By embedding best practices from the outset, such as schema management, Role-Based Access Control (RBAC), and robust metadata, JPMC seeks to ensure that its data is of high quality, secure, and easily discoverable across the enterprise.This foundation is crucial for modernization, supporting Agentic AI and stream operations by providing a reliable and high-quality data backbone. The ability to effectively deliver high-quality, discoverable data with contracts and SLAs is seen as the pivot around which future modernization will occur, moving towards a more automated, non-manual operating environment. This strategic investment allows JPMC to enhance its capabilities and prepare for new innovations.
Matthew Walker
The Curious Case of Streaming Windows
There is basically no stream processing without windowing, and Kafka Streams provides a rich set of built-in windows for aggregations and joins. However, it is often unclear to developers how different window types works, and even more important for what use-case a specific window is a good fit. In particular, sliding windows are often a mystery, and are easily confused with hopping windows.In this talk, we will explain the different window types of Kafka Streams, give guidance when to use what window, and unriddle the curious case of the sliding window. Furthermore, we give a sneak preview into the new "BatchWindow" type, that was proposed via KIP-1127 recently, which unlocks new use-cases that where hard to cover in the past. -- Join this session to become a windowing expert and set yourself up for success with Kafka Streams.
Matthias J Sax
Kroxylicious: Taking a bite out of the Kafka Protocol
As Apache Kafka usage continues to grow, it gets deployed in increasingly sensitive and regulated environments. At the same time, data engineering teams have more and more requirements to satisfy the needs of businesses to support AIs and provide real time business intelligence. Unfortunately, for historical or design reasons, Apache Kafka is not able to provide all the features everybody needs.One solution gaining traction over the last couple of years is to proxy Apache Kafka. This session introduces Kroxylicious, an open source Kafka protocol aware transparent proxy (part of the Commonhaus foundation). Kroxylicious offers developers a standardised Filter API to allow them to customize the messages passing through the proxy as well as a plug-in based extension mechanism to allow them to interact with remote resources, such as Key Management Systems or Schema registries. All this is completely invisible to clients and clusters and does not require updating them.Out of the box Kroxylicious provides a customizable record encryption to ensure data at rest is safe even if you use a cloud provider. It also integrates with a schema registry so you can ensure that records sent to specific topics match the configured schemas. As it fully understands the Kafka protocol, it opens the possibility for building a wide range of features such as automatic cluster failover, offloading authentication, multitenancy, etc.At the end of this talk, attendees will understand the core principle and out-of-the-box functionalities of Kroxylicious. They will also know how to run and operate it, as well as know how to incorporate custom business logic.
Sam Barker
Escape the Micro-Maze: Build Fast, Scalable Streaming Services with Apache Flink
So, you're building microservices, and if you're like me, you've probably found yourself wrestling with Kubernetes, trying to manage state, handle failures, and figure out scaling for each service. Someone inevitably says, "Just build it stateless!" and I always think, "I'd love to see that work seamlessly in the real world." I believe there's a more straightforward way to build fast, resilient user experiences.In this talk, I want to share a somewhat radical idea for those of us tired of the traditional microservice shuffle: building our operational logic, and even entire microservices, directly in Apache Flink. I'm not just talking about data pipelines; I'm proposing we start "going operational with Flink," moving beyond its traditional analytical domain.I'll dig into why I think Flink offers a distinct advantage for application development. First, Flink was born for state, and I'll show you how its robust state backends can simplify what's often a major headache in microservice architectures. Then, we'll look at how Flink's inherent fault tolerance and scaling mechanisms can apply to our application logic, not just data processing – meaning less ops and more dev for us. Finally, I'll discuss practical approaches for handling point-to-point calls, comprehensive state management, and general application development patterns within Flink. I've come to think of Flink as an application server, supercharged for streams and state.Join me to see how Apache Flink can simplify our architectures, make our user experiences faster, and potentially let us bid farewell to some of those microservice complexities. And with a bit of help From Kafka streams, we'll see it action
Ben Gamble
Scaling Streaming Computation at LinkedIn: A Multi-Year Journey with Apache Flink
At LinkedIn, stream processing is the foundation for delivering real-time features, metrics, and member experiences across products like Ads AI, Search, Notifications, and Premium. Over the past four years, we’ve built and evolved a fully managed stream processing platform based on Apache Flink to meet increasing demands for scale, state, and reliability.This talk shares our journey from building a self-serve, Kubernetes-native Flink platform to supporting high-throughput, stateful applications with managed Flink SQL. Today, our platform powers thousands of mission-critical pipelines and enables developers to author and deploy jobs declaratively, while abstracting away operational complexity.As workloads grew in complexity and state size, we tackled state management challenges head-on: optimizing checkpointing and recovery, evaluating state storage options, and navigating trade-offs in scalability, cost, and performance. We’ll walk through how we scaled stateful joins, onboarded high-QPS applications, and migrated from Samza and Couchbase to Flink SQL - achieving over 80% hardware cost savings.Key highlights include:- Building a self-serve Flink platform on Kubernetes with split deployment, monitoring, alerting, auto-scaling, and failure recovery- Scaling Flink SQL: challenges and lessons from supporting large-stateful jobs, including state storage choices, state garbage collection (GC) failures, and inefficient job sizing- Diagnosing performance bottlenecks and building a resource estimation model for join-intensive Flink SQL pipelines- Developing tooling for safe migrations, automating reconciliation and backfill workflows, and enabling end-to-end validationWe’ll share the lessons learned and platform investments that helped us scale Apache Flink from early experimentation to a robust, production-grade streaming engine. Whether you're building a Flink-based platform or migrating stateful pipelines at scale, this talk offers actionable insights from operating Flink in production.
Weiqing Yang
Beyond Documentation: AI Agents as Flink Debugging Partners
Operating over 1,000 Apache Flink applications at Stripe has taught us that even the most comprehensive documentation can't eliminate the cognitive load of debugging complex distributed systems. Non experienced flink developers routinely juggle multiple tools—Flink UI, Prometheus metrics, Splunk logs—while cross-referencing extensive runbooks to diagnose failures. This operational overhead inspired us to explore an unconventional solution: integrating AI coding agents directly into our Flink platform.In this talk, we'll share how we transformed Flink debugging from a multi-tool treasure hunt into an intelligent, conversational experience. Our integration enables AI agents to:-- Automatically fetch and correlate metrics -- Parse logs for relevant error patterns-- Navigate our extensive Flink documentation and runbooksGenerate contextual debugging suggestionsThis talk shares our implementation journey, quantitative improvements (x% faster diagnosis), and the critical human-in-the-loop patterns that ensure safety. You'll see real debugging sessions, learn how we chose the right model, and understand where it fails. We'll conclude with actionable insights for teams considering AI-assisted operations.
Pratyush Sharma / Seth Saperstein
Ursa: Augment Your Lakehouse With Kafka-Compatible Data Streaming Capabilities
As data architectures evolve to meet the demands of real-time GenAI applications, organizations increasingly need systems that unify streaming and batch processing while maintaining compatibility with existing tools. The Ursa Engine offers a Kafka-API-compatible data streaming engine built on Lakehouse (Iceberg and Delta Lake). Designed to seamlessly integrate with data lakehouse architectures, Ursa extends your lakehouse capabilities by enabling streaming ingestion, transformation and processing — using a Kafka-compatible interface.In this session, we will explore how Ursa Engine augments your existing lakehouses with Kafka-compatible capabilities.Attendees will gain insights into Ursa Engine architecture and real-world use cases of Ursa Engine. Whether you're modernizing legacy systems or building cutting-edge AI-driven applications, discover how Ursa can help you unlock the full potential of your data.
Gaurav Saxena / David Kjerrumgaard
From “Where’s My Money?” to “Here’s Your Bill”: Demystifying Kafka Chargebacks and Showbacks
Have you ever wondered how the money you spent on those kafka clusters is being utilized? Or how much end users should be paying for those awesome use cases that they run in production at scale without worrying about downtime and resiliency? How do you charge that one person who requested those 1000 partition topics, or the one who has like 3 out of 1200 topics but is using about 70% of the available network throughput for your cluster. If you ever wondered about any of these questions, this talk is for you. In this talk, we will deep dive into ways to dissect your Kafka bills and attribute them to your end users, your business teams, your application teams that depend on these Kafka clusters. I will help you understand the fundamentals of how to approach chargebacks/showbacks for Kafka and show you how deep the rabbit hole goes. Using open source tooling and an example, we’ll discuss:* Techniques to define a core identity – an mTLS certificate or a SASL user or a Business unit? Which one is it and which one should it be?* How to envision cost split – Should it be spread evenly or should there be a usage based differentiation for things like network over-utilization? Noisy neighbour anyone?* Chargeback – What should be the final output product of your process and how should it be delivered? Is an excel sheet enough or do you want a dashboard that keeps updating itself automagically?By the end of this talk, you will be able to understand the fundamentals to help you either build out your own cost analysis for Kafka or use the tool to just say - “Here’s your Bill”.
Abhishek Walia
Robinhood’s Use of WarpStream for Logging
As applications scale, so do the cost and complexity of logging. Robinhood has historically used Apache Kafka extensively for its logging needs, but a new technology has emerged. In this session, we'll show developers how to build high-performance, cost-efficient logging pipelines using WarpStream, Confluent's serverless, Kafka-compatible streaming platform. The talk will largely focus on:- A quick introduction of WarpStream as a technology and important features.- Why Robinhood decided to invest in WarpStream for logging workloads.- Advantages and the tradeoffs made moving from WarpStream to Kafka, focusing on areas of performance, reliability, and cost.- The Humio migration process from Kafka to WarpStream to move critical logging workloads while minimizing logging disruptions.If your organization currently runs Kafka to power logging workloads and is interested in exploring WarpStream as a solution, please attend this talk to see how Robinhood has done it to see if there are any learning points that can be applied to your own organization.
Ethan Chen / Renan Rueda
A Deep Dive into Kafka Consumer Rebalance Protocols: Mechanisms and Migration Process Insights
[KIP-848](https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol) introduces a new consumer rebalancing protocol to Kafka that differs significantly from its existing one. This session guides attendees through a detailed comparison of the existing and new consumer rebalance protocols and a thorough examination of the migration mechanisms involved in transitioning between them.We offer a comprehensive overview of the fundamentals underlying the classic rebalancing protocols with different assignment strategies, as well as the newly introduced incremental rebalancing protocol. The overview includes the intricacies of group coordination, partition assignment strategies of each protocol.Following the overview, we delve deeply into the mechanisms that drive the migration process between the old and new rebalance protocols. This includes exploration of stop-the-world offline migration methodologies and the more sophisticated online migration techniques that support non-empty group conversions. Detailed case studies illustrate the steps of upgrading and downgrading consumer groups, including handling the intermediate states where the group coordinator manages membership statuses across different protocols.Through these insights, attendees will gain an understanding of the intricacies involved in seamlessly transitioning consumer groups and the improvements brought by the new rebalance protocol. This talk will also be of particular interest to distributed systems professionals who want to know more about the internals of Apache Kafka.
Dongnuo Lyu / David Jacot
Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog
Learn how Databricks and Confluent are simplifying the path from real-time data to governed, analytics- and AI-ready tables. This session will cover how Confluent Tableflow automatically materializes Kafka topics into Delta tables and registers them with Unity Catalog — eliminating the need for custom streaming pipelines. We’ll walk through how this integration helps data engineers reduce ingestion complexity, enforce data governance and make real-time data immediately usable for analytics and AI.
Jason Pohl / Kasun Indrasiri
GC, JIT and Warmup: The JVM’s Role in Flink at Scale
The JVM plays a critical but often overlooked role in the performance of Apache Flink applications. In this talk, we’ll examine how core JVM mechanisms - garbage collection (GC), Just-In-Time (JIT) compilation, and warmup behavior - can introduce latency, affect throughput, and lead to unpredictable performance in long-running Flink jobs.We’ll break down the impact of GC algorithms on streaming workloads, explore how JIT optimizations can cause performance shifts during job execution, and explain why the warmup phase matters and what can be done about it. We'll be correlating performance charts and GC and compilation logs leaving the attendees with a deeper understanding of how the JVM interacts with Flink's runtime.
Jiří Holuša
StreamLink: Real-Time Data Ingestion at OpenAI Scale
In the modern data lakehouse, real-time ingestion isn’t just a nice-to-have – it’s a foundational capability. Model training and evaluation, human analysts, and autonomous AI agents all demand fresh, trustworthy data from diverse sources at massive scale. These expectations are a challenge for platform teams – but they’re also an opportunity to unlock massive business value.At OpenAI, we built StreamLink, a real-time streaming ingestion platform for the data lakehouse, powered by Apache Flink. StreamLink ingests 100+ GiB/s of data from Kafka into Delta Lake and Iceberg tables, supporting 2000 datasets across 20+ partner teams.In this session, we’ll dive deep into the design and implementation of StreamLink. We’ll explore our Kubernetes‑native deployment model (Flink K8s Operator), adaptive autoscaling heuristics, and self‑service onboarding model – all of which keep platform operations lean. Attendees will take away concrete patterns for building scalable, manageable real-time ingestion systems in their own data lakehouse.
Adam Richardson
Powering Real-Time Vehicle Intelligence at Rivian with Apache Flink
At Rivian, our mission is to design and build vehicles that inspire and enable sustainable exploration while delivering a seamless, intelligent user experience. Our connected fleet streams real-time telemetry including sensor data that includes location, battery SOC, etc. To turn this firehose of raw information into instant driver alerts and years of searchable insight, we rely on Apache Flink, Kafka. In this talk, we’ll show how we built a scalable, cloud-native stack that powers real-time features and long-term intelligence.Our vehicles generate a continuous stream of telemetry data. To handle this firehose of information, we’ve built a robust stream processing architecture centered around Flink ingestion pipelines. These pipelines process and enrich the data in real time, powering both internal analytics and external customer experiences.One of the standout components of our platform is Event Watch, which is a Flink-powered feature that allows teams and customers to define streaming jobs that detect key events like abnormal battery drain, collision detection or vehicle movement in/out off a geofence. These events trigger mobile push notifications instantly, enabling proactive maintenance, safety features, and personalized alerts.Beyond real-time event detection, we’ve designed our system for both low-latency responsiveness and long-term analytical depth. Processed telemetry is stored in Databricks Delta tables for scalable historical analysis, while a time series database supports fast, live queries for dashboards and monitoring systems.We’ll walk through how we’ve architected this dual-purpose system; balancing high-throughput stream processing with the flexibility to drill down into historical trends. We’ll also cover how Flink’s stateful processing model enables complex event patterns and reliable delivery, even at scale.Join us to learn how Rivian is building the future of connected vehicles one event stream at a time.
Rupesh More / Guruguha Marur Sreenivasa
Stream All the Things — Patterns of Effective Data Stream Processing
Data streaming is a really difficult problem. Despite 10+ years of attempting to simplify it, teams building real-time data pipelines can spend up to 80% of their time optimizing it or fixing downstream output by handling bad data at the lake. All we want is a service that will be reliable, handle all kinds of data, connect with all kinds of systems, be easy to manage, and scale up and down as our systems change.Oh, it should also have super low latency and result in good data. Is it too much to ask?In this presentation, we’ll discuss the basic challenges of data streaming and introduce a few design and architecture patterns, such as DLQ, used to tackle these challenges.We will then explore how to implement these patterns using Apache Flink and discuss the challenges that real-time AI applications bring to our infra. Difficult problems are difficult, and we offer no silver bullets. Still, we will share pragmatic solutions that have helped many organizations build fast, scalable, and manageable data streaming pipelines.
Adi Polak
Unlocking the Mysteries of Apache Flink
Apache Flink has grown to be a large, complex piece of software that does one thing extremely well: it supports a wide range of stream processing applications with difficult-to-satisfy demands for scalability, high performance, and fault tolerance, all while managing large amounts of application state.Flink owes its success to its adherence to some well-chosen design principles. But many software developers have never worked with a framework organized this way, and struggle to adapt their application ideas to the constraints imposed by Flink's architecture.After helping thousands of developers get started with Flink, I've seen that once you learn to appreciate why Flink's APIs are organized the way they are, it becomes easier to relax and accept what its developers have intended, and to organize your applications accordingly. The key to demystifying Apache Flink is to understand how the combination of stream processing plus application state has influenced its design and APIs. A framework that cares only about batch processing would be much simpler than Flink, and the same would be true for a stream processing framework without support for state.In this talk I will explain how Flink's managed state is organized in its state backends, and how this relates to the programming model exposed by its APIs. We'll look at checkpointing: how it works, the correctness guarantees that Flink offers, and what happens during recovery and rescaling.We'll also look at watermarking, which is a major source of complexity and confusion for new Flink developers. Watermarking epitomizes the requirement Flink has to manage application state in a way that doesn't explode as those applications run continuously on unbounded streams.This talk will give you a mental model for understanding Apache Flink. Along the way we'll walk through several examples, and examine how the Flink runtime supports their requirements.
David Anderson
Smart Action in Real-time: Building Agentic AI Systems Powered by AWS and Confluent Streaming
Agentic AI systems thrive on the combination of real-time data intelligence and autonomous action capabilities. This session demonstrates how to integrate Confluent's scalable data streaming platform with Amazon Bedrock and SageMaker to build responsive, intelligent systems that can both reason and act. We'll explore architectural patterns for ingesting, processing, and serving data streams at scale with end-to-end governance, highlighting how Confluent's pre-built connectors, in-stream processing, and low-latency inference capabilities effectively contextualize foundation models. We'll examine Agentic AI through the lens of event-driven architecture with well-orchestrated AI microservices for maximum effectiveness. Attendees will learn practical approaches to create GenAI applications with enriched data streams, ensuring accurate and responsive model performance. We'll demonstrate how to optimize agentic workflows by leveraging Bedrock Agents, SageMaker, and MCP Servers. Leave with an architectural blueprint and implementation strategies to help your organization reduce AI infrastructure costs and latency while enabling real-time context awareness, system flexibility, and exceptional customer experience.
Weifan Liang / Braeden Quirante
The Kafka Protocol Deconstructed: A Live-Coded Deep Dive
Kafka powers the real-time data infrastructure of countless organizations, but how many of us really understand the magic behind its speed and reliability? What makes a Kafka broker capable of handling millions of events per second while ensuring durability, ordering, and scalability? And why do features like idempotent producers, log compaction, and consumer group rebalance work the way they do?In this deep-dive live-coding session, we’ll dissect Kafka down to its essence and rebuild a minimal, but fully functional, broker from scratch. Starting with a raw TCP socket, we’ll implement:- Kafka’s Binary Wire Protocol: decode Fetch and Produce requests, frame by frame- Log-Structured Storage: the secret behind Kafka’s append-only performance- Batching & Compression: How Kafka turns thousands of messages into one efficient disk write- Consumer Coordination: Group rebalances, offset tracking, and the challenges of "who reads what?"- Replication & Fault Tolerance: why ISR (In-Sync Replicas) is needed for high availability- Idempotence & Exactly-Once Semantics: the hidden complexity behind "no duplicates"Along the way, we’ll expose Kafka’s design superpowers and its tradeoffs, while contrasting our minimal implementation with the real Kafka’s added layers (KRaft, SASL, quotas, etc.).By the end, you won’t just use Kafka, you’ll understand it. Whether you’re debugging a production issue, tuning performance, or just curious about distributed systems, this session will change how you see Kafka.Key Takeaways:- How Kafka’s protocol works- The role of log-structured storage in real-time systems- Why replication and consumer coordination are harder than they look- Where the real Kafka adds complexityNo prior Kafka internals knowledge needed, just a love for distributed systems and live coding.
Mateo Rojas
From cockpit to Kafka: Streaming design lessons from aviation
In aviation, real-time data keeps flights safe, aircraft moving, and operations running smoothly. In this talk, we’ll explore how Kafka-based streaming powers aviation - from orchestrating fast aircraft turnarounds on the ground, to monitoring flight performance in the air, and enabling instant decisions by crew and ground teams through connected operational systems.Drawing on real-world experience building an airline-scale streaming platform, I’ll share practical lessons for platform engineers, including:- Designing for failure, not perfection - making failures predictable, contained, and recoverable through idempotence, DLQs, and retry strategies- Managing transformations at scale – ksqlDB patterns and lessons learned handling complex XML payloads- Isolating workloads through tenant separation - providing streaming corridors for data, compute, and fault containment- Enforcing data contracts - managing schema evolution across disparate aviation operational systems- Keeping it simple in complex environments – building boring, understandable, and debuggable pipelinesYou’ll leave with practical patterns and mental models for building Kafka based streaming platforms that are resilient, trusted, and can operate at scale in safety critical industry.
Simon Aubury
Bringing Stories to Life With AI, Data Streaming and Generative Agents
Storytelling has always been a way to connect and imagine new worlds. Now, with Generative Agents - AI-powered characters that can think, act, and adapt - we can take storytelling to a whole new level. But what if these agents could change and grow in real time, driven by live data streams?Inspired by the Standford's paper "Generative Agents: Interactive Simulacra of Human Behavior", this session explores how to build dynamic, AI-driven worlds using Apache Kafka, Apache Flink, and Apache Iceberg. We'll use a Large Language Model to power for conversation and agent decision-making, integrate Retrieval-Augmented Generation (RAG) for memory storage and retrieval, and use JavaScript to tie it all together. Along the way, we’ll examine different approaches for data processing, storage, and analysis.By the end, you’ll see how data streaming and AI can work together to create lively, evolving virtual communities. Whether you’re into gaming, simulations, research or just exploring what’s possible, this session will give you ideas for building something amazing.
Olena Kutsenko
Beyond Message Key Parallelism: Increasing Dropbox Dash AI Ingestion Throughput by 100x
The core theme of the talk is to build upon existing parallel consumer work that allows for message key level parallelism while retaining ordering guarantees without provisioning additional partitions. We extend the principles by decomposing messages into smaller sub-messages, allowing for messages with the same key to be processed simultaneously while still retaining ordering guarantees. The sub-message parallel consumer allows for faster time to market, at lower latency and cost versus existing methods presented in literature.This talk walks through a very real scenario we experienced when scaling up Dropbox Dash (AI assistant): the deadline to onboard a customer is tomorrow morning, but the backlog needs 2 weeks to finish processing due to poor key choice in early stages of development leading to every message ending up on the same partition.I will recap/summarize existing topics to set the context:1. Conventional Kafka parallelism (partition level)2. Message key level parallelism using techniques discussed for the Confluent Parallel ConsumerI will also present the additional constraints that we face in our own system:1. It is not feasible to change the producer quickly due to other consumers depending on the event stream2. Long chain of messages with same key rendering key based message level parallelism ineffective3. Extra latency + monetary cost to consume, break down messages, produce, consume again not desirableI will present the novel method we adopted to clear the backlog with ~100x throughput gain and onboard the customer on time: sub-message parallel consumer, the constraints it functions under, and the intuition for the proof of why it works. I will provide some benchmarks around the performance, and close out the talk with Q&A.Key takeaways:1. Kafka messages can be parallelized beyond whole messages2. Clever processing on consumer side can result in lower latency and costs vs breaking down messages and re-producing them, while not affecting other consumers on the same topic the way a producer side change would
David Yun