Beyond Message Key Parallelism: Increasing Dropbox Dash AI Ingestion Throughput by 100x
Breakout Session
The core theme of the talk is to build upon existing parallel consumer work that allows for message key level parallelism while retaining ordering guarantees without provisioning additional partitions. We extend the principles by decomposing messages into smaller sub-messages, allowing for messages with the same key to be processed simultaneously while still retaining ordering guarantees. The sub-message parallel consumer allows for faster time to market, at lower latency and cost versus existing methods presented in literature.
This talk walks through a very real scenario we experienced when scaling up Dropbox Dash (AI assistant): the deadline to onboard a customer is tomorrow morning, but the backlog needs 2 weeks to finish processing due to poor key choice in early stages of development leading to every message ending up on the same partition.
I will recap/summarize existing topics to set the context:
1. Conventional Kafka parallelism (partition level)
2. Message key level parallelism using techniques discussed for the Confluent Parallel Consumer
I will also present the additional constraints that we face in our own system:
1. It is not feasible to change the producer quickly due to other consumers depending on the event stream
2. Long chain of messages with same key rendering key based message level parallelism ineffective
3. Extra latency + monetary cost to consume, break down messages, produce, consume again not desirable
I will present the novel method we adopted to clear the backlog with ~100x throughput gain and onboard the customer on time: sub-message parallel consumer, the constraints it functions under, and the intuition for the proof of why it works. I will provide some benchmarks around the performance, and close out the talk with Q&A.
Key takeaways:
1. Kafka messages can be parallelized beyond whole messages
2. Clever processing on consumer side can result in lower latency and costs vs breaking down messages and re-producing them, while not affecting other consumers on the same topic the way a producer side change would
David Yun
Dropbox