Breakout Session

Protobuf Support in Uber's Real-Time Data Stack

< All 2024 Sessions

A Real-time data platform (Apache Kafka, Apache Flink and Apache Pinot) plays a crucial role in Uber’s serving stack which primarily uses Avro as serialization format. At the same time, Protobuf is the primary data format at Uber for online applications and services. The discrepancy make engineers to take additional effort in order to convert the Protobuf data model into the Avro data model to tap into the real-time data stack.

In this talk, we will share our work to make Protobuf the first-class citizen in the real-time data stack (Kafka/Flink/Pinot/Ingestion etc). Including

- how to register/manage Protobuf schema

- the schema evolution

- Enable Protobuf support throughout the stack, etc.

With this support, we will allow services and applications to use Protobuf for streaming processing, real-time analytics, and datalake ingestion without the overhead of format conversations. Moreover, this support can be leveraged to improve service resilience and cost efficiency.

Yang Yang

Uber

Download