Background

Breakout Session

Kafka - No Rocks, Please : Using Kafka Streams With Alternative State Stores

A popular data transformation library in the Kafka world - Kafka Streams - is bundled with a high-performance key/value store - RocksDB - with many advantages regarding near real-time, high throughput, and low latency processing.  

  But what if RocksDB key/value store semantics are just not enough? What if the use case requires features of other storage engines - document indexing, graph processing, or simply multiple secondary indexes? Can we employ other embeddable storage engines, for example, Lucene, Neo4j, or extend RocksDB-based state stores?  

  Luckily, Kafka Streams supports custom state store implementations through a pluggable interface. Even better - we can delegate durability, data sharding, and failure recovery to Kafka Streams runtime. Of course, there are some associated restrictions as well. That, and more, is what we will explore in this session.  

  We will cover the following:  

  - Anatomy of state store - interface, concepts, and high-level implementation overview  

- Going deeper - changelog and durability, state restoration, sharding, segmentation, and data expiration  

- Swapping engines - implementation of a state store using a different persistence engine  

- Adding new features - extending state store interface with new operations, changelog, and restoration considerations  

- Wiring it all together - accessing state stores from Processor API, writing transformers, and wiring new operations into DSL  

  In this session, you will learn different aspects of Kafka Streams state store architecture and gain a good starting point for extending existing and implementing new stateful operations in Kafka Streams applications.

Roman Kolesnev

Confluent