Background

Breakout Session

Streamlining History: ClickHouse & Flink's Fast-Track for Data Backfills

Integrating historical data with real-time streams is a common challenge in data streaming applications. Our approach leverages Apache Flink's Hybrid Source capabilities and ClickHouse's storage efficiency to selectively synchronize only the data that matters. This strategy streamlines the backfill process by leveraging true random access capabilities, unattainable with Kafka alone, dramatically reducing the time and resources required for effective data integration in streaming environments.

We devised a solution that employs ClickHouse for rapid access to subsets of the historical data relevant to ongoing processes, then shifts smoothly to live Kafka streams. This method significantly cuts down the processing time for large-scale data backfills from days to minutes, enabling efficient handling of both data at rest, and data in motion at a fraction of the cost.

The session will cover the technical setup of using Apache Flink and ClickHouse for selective backfilling, addressing the encountered challenges and detailing the performance gains. It showcases a scalable, resource-efficient approach to improving real-time data processing, offering insights for streaming application developers looking to enhance operational efficiency.

Rafael Aguiar

Goldsky