Breakout Session

Kafka Streams as a Data Store for a Workflow Engine

< All 2024 Sessions

At LittleHorse, we are building a high-performance workflow engine for microservice orchestration. We have a long list of performance-related requirements, including low latency, horizontal scalability for throughput, transactional semantics, and high availability. Long story short: we decided to build our own data store on top of Kafka Streams.

You might raise your eyebrows when someone suggests using Kafka as a database, and for very good reason. Apache Kafka is a distributed and replicated commit log, which in itself does not make it a "database". However, under the hood almost all databases have three core components (among others): a write-ahead log (WAL), an indexing layer, and a query layer. Stream processing systems can provide the scaffolding to create an indexing layer—for example, Kafka Streams and Flink are both frameworks that use RocksDB to "index" data in your Kafka "WAL"—but there still remains more work to build a full data store.

We will explain the reasoning behind making the core architectural decision to build a homegrown data store using Kafka Streams. We will then discuss topics such as how we enable synchronous responses to POST requests, how we enable secondary indexes, and why this architecture forces us to use the "Aggregate" pattern for data structures.

Join us to learn:

- What data consistency guarantees are provided by Kafka Streams. - How an "inside-out database" impacts available query patterns. - How Kafka Streams provides fault-tolerance and high-availability.

Colt McNealy

LittleHorse Enterprises LLC

Download