Change Data Capture at Scale: Insights from Slack’s Streaming Pipeline

Breakout Session

Slack was burning cash on batch data replication, with full-table restores causing multi-day latency. To slash both costs and lag, we overhauled our data infrastructure—replacing batch jobs with Change Data Capture (CDC) streams powered by Debezium, Vitess, and Kafka Connect. We scaled to thousands of shards and streamed petabytes of data. This talk focuses on the open source contributions we made to build scalable, maintainable, and reliable CDC infrastructure at Slack.

We'll show how we cut snapshotting time—from weeks to hours—for our largest table, half a petabyte in size and spread across hundreds of shards. You’ll learn how to apply our optimizations in Debezium, tune Kafka Connect configs, and maximize throughput. We’ll also cover how we tackled one of streaming’s most elusive challenges: detecting accurate time windows. By contributing a binlog event watermarking system to Vitess and Debezium, we made it possible to ensure correctness in a distributed system with variable lag. Finally, we’ll show you how to detect & prevent data loss in your own pipelines by applying the fixes we contributed to Kafka Connect and Debezium, which addressed subtle edge cases we uncovered in these systems.

Attendees will leave with practical techniques for deploying, scaling, and maintaining reliable CDC pipelines using open source tools—and a deeper understanding of how to avoid the common (and costly) pitfalls that can hinder the success of streaming data pipelines.


Tom Thornton

Slack