Current Bengaluru 2025
Session Archive
Check out our session archive to catch up on anything you missed or rewatch your favorites to make sure you hear all of the industry-changing insights from the best minds in data streaming.


Queues for Kafka
Event streaming is great but sometimes it’s easier to use a queue, especially when parallel consumption is more important than ordering. Wouldn't it be great if you had the option of consuming your data in Apache Kafka just like a message queue? For workloads where each message is an independent work item, you’d really like to be able to run as many consumers as you need, cooperating to handle the load, and to acknowledge messages one at a time as the work is completed. You might even want to be able to retry specific messages. This is much easier to achieve using a queue rather than a topic with a consumer group. KIP-932 brings queuing semantics to Apache Kafka. It introduces the concept of share groups. Share groups let your applications consume data off regular Kafka topics with per-message acknowledgement and without worrying about balancing the number of partitions and consumers. With this KIP, you can bring your queuing workloads to Apache Kafka. Come and hear about this innovative new feature being added to Apache Kafka 4.0.
Andrew Schofield, Apoorv Mittal


Unifying Kafka and Relational Databases for Event Streaming Applications
Kafka and relational databases have long been part of event-driven architectures and streaming applications. However, Kafka topics and database tables have historically been separate abstractions with independent storage and transaction mechanisms. Making them work together seamlessly can be challenging, especially because queuing has been viewed as an anti-pattern in a stock database. This talk will describe how to close this gap by providing a customized queuing abstraction inside the database that can be accessed via both SQL and Kafka’s Java APIs. Since topics are directly supported by the database engine, applications can easily leverage ACID properties of local database transactions allowing exactly-once event processing. Patterns such as Transactional Outbox (writing a data value and sending an event) or any atomicity required across many discrete database and streaming operations can be supported out of the box. In addition, the full power of SQL queries can be used to view records in topics and also to join records in topics with rows in database tables. In this talk we cover the synergy between Kafka's Java APIs, SQL, and the transactional capabilities of the Oracle Database. We describe the implementation, which uses a transactional event queue (TxEventQ) to implement a Kafka topic and a modified Kafka client that provides a single, unified JDBC connection to the database for event processing and traditional database access.
Mahesh Girkar


The IDE That Speaks Kafka: Mastering Streams in VS Code
Apache Kafka has become an essential technology for modern data streaming applications. However, its learning curve can be steep for developers. This presentation will help you overcome the everyday challenges of Kafka development and streamline your development experience using the Confluent VS Code Extension. We begin by exploring the hurdles developers face when starting with Kafka: grappling with complex concepts, bootstrapping initial code, managing data, and achieving meaningful interaction with their applications. Then, we introduce the game-changing Confluent VS Code Extension - an open-source, free tool designed to transform the Kafka development experience. Through a practical, live demonstration, we'll follow a new application developer's day and showcase how the extension simplifies everything from environment setup to schema management. You'll see how to rapidly generate and deploy producer and consumer applications, handle schema evolution, debug message validation issues, and manage your development environment effectively without leaving your IDE. The presentation concludes with real-world implementation strategies, including GitOps integration and multi-environment management. Join the growing community of developers revolutionizing their Kafka development workflow. Start building faster, more intelligently, and reliably with the Confluent VS Code Extension today.
Viktor Gamov


Towards Manufacturing 5.0 - Edge IoT Platform with Sparkplug and Apache Kafka
John Deere manufacturing factories are equipped with thousands of state-of-the-art smart industrial robots and other machines. These next-gen factories are on a path to Industry 5.0 which requires equally advanced and well-integrated OT and IT systems to enable real-time availability and processing of OT data for faster decision making near the source of data in the factory and for improved overall operational efficiency in the organization. We present our Manufacturing IoT Edge platform, designed to fulfil the vision of Manufacturing 5.0, using open-source tools and standard protocols like MQTT, Sparkplug, Apache Kafka, Kafka Connect, Kafka Streams and more for collection, contextualization, stream processing, historization and analysis of manufacturing OT data in real-time. We cover technical details like core concepts of MQTT protocol with Sparkplug specification, how it is optimized for SCADA/IIoT solutions, how the Sparkplug data is processed using open-source Apache Kafka and ecosystem including custom-built Kafka Connectors for ingestion and stateful Kafka Streams processors. All the details we plan to present are relevant for building IoT Edge platforms for any other industrial domain as well. If you want to learn about any of the following, come join us! * Classic challenges of Industrial edge IoT platforms * Solution architecture and design trade offs * Technical details of MQTT, Sparkplug, Kafka Connect and Kafka Streams * Specific complexities of stream processing of Sparkplug data with Kafka and ways to handle these * Overall, how industrial IoT Edge use case is implemented
Arti Pande, Ganesh Jadhav


Re-engineering Kafka Consumers for 'India Scale': Dream11’s Journey to Processing Millions of Message
In this session, Dream11 engineering team will share the secret sauce and the innovation around Apache Kafka consumers, processing tens of millions of events using a re-engineered Kafka consumer library. Dream11 is one of the largest fantasy sports platforms in the world, handling peak user concurrency of over 15 million during IPL 2024, with edge RPM surpassing 300 million. The business operates under highly time-sensitive conditions, experiencing hockey-stick traffic surges just before the start of matches. To ensure real-time updates for game users, the Dream11 platform heavily relies on Apache Kafka in the critical pipelines of end user services. As the scale grew, the legacy Kafka consumer (simple, high-level) began facing challenges such as delays and data loss, severely impacting user trust. To address these issues, the Dream11 engineering team innovated and developed a low-level Kafka consumer. In this consumer, polling is decoupled from processing and executing both in parallel which fixed our frequent rebalancing problem. For processing the messages we created dedicated worker pool which improved our speed significantly. We disabled the auto-commit and commits were done in batches making sure at-least-once processing, ensuring no data loss. With the growth of the microservices ecosystem, Kafka pipelines became integral to many services. Building on the success of the low-level consumer, Dream11 engineering team created a platform from Kafka consumer library that abstracted the complexities of Kafka integration. This library provides simple interfaces for developers to implement business logic seamlessly. Over time, it matured with features like backpressure, enabling developers to process messages locally during incidents or to scale across a consumer pool with varied core counts. Join this session to learn strategies to optimize Kafka consumers for low latency and high reliability at massive scale.
Bipul Karnani


Cost-Effective Logging at Scale: ShareChat’s Journey to WarpStream
In August 2023, WarpStream introduced itself as a Kafka-compatible, S3-native streaming solution offering powerful features such as a BYOC-native approach, decoupling of storage and compute as well as data and metadata, offset-preserving replication, and direct-to-S3 writes. It shines in a specific niche—logging, observability, and data lake feeding—where a slight increase in latency is a fair trade-off for substantial cloud cost savings and simplified operations. In this session, we'll take a look into ShareChat's journey of migrating our logging systems from managed Kafka-compatible solutions to WarpStream. At ShareChat, logging suffered from 2 issues: highly unpredictable workloads and high inter-zone fees for data replication across brokers. Logging volume could spike up to 5 times the normal rate for brief periods before returning to baseline. We had to over-provision our Kafka clusters to prevent costly rebalancing and scaling issues, resulting in unnecessary expenses. WarpStream offers a solution with its stateless, autoscaling agents—eliminating the need to manage local disks or rebalance brokers. Moreover, by leveraging S3 for replication, WarpStream allows us to eliminate inter-zone fees. In this session, we’ll discuss things like setting up WarpStream in your cloud, best practices for agents (brokers) and clients, fine-tuning your cluster's latency, and offer advice for local testing. You'll see a detailed cost comparison between WarpStream and both multi-zone and single-zone Kafka-compatible solutions. Additionally, we'll demonstrate how to set up comprehensive monitoring for your WarpStream cluster at various levels of granularity—including agent, topic, and zone. Finally, we'll cover essential alerts you should configure for your agents and our experience in consuming from WarpStream from inside Spark jobs and share the best Spark configs that worked for us.
Vivek Chandela, Shubham Dhal