Streaming with Apache Iceberg

Breakout Session

Streaming data is a critical component of modern data architectures. This talk explores how to determine your streaming needs and design a robust solution using Apache Iceberg, a next-generation table format built for flexibility and scalability. We’ll dive into the foundational tools that enable streaming pipelines, including Apache Flink, Apache Kafka, Debezium, Kafka Connect, and Apache Spark, breaking down their roles and use cases in processing, transporting, and transforming streaming data.

The talk will also highlight Iceberg-specific considerations, such as managing compaction to optimize query performance and dealing with delete files for handling record-level updates and deletes. Whether you’re building real-time analytics, powering machine learning models, or streaming raw data into your data lakehouse, this session will provide actionable insights and best practices for building reliable and efficient streaming workflows with Apache Iceberg.

Will Martin

Dremio