Inside Uber's Large-Scale Real-Time Analytics Platform

Breakout Session

At Uber, the EVA platform that drives substantial advancements in our real-time analytics capabilities, empowering various business use cases across marketing, engineering, data science, and operations and internal use cases around metrics, logs & query analytics. The platform features Apache Kafka for real-time data transport, Apache Flink for stream processing, Spark for batch processing, HDFS for deep storage needs, and Apache Pinot as the core analytics engine. Additionally, it features internal service Neutrion for Presto-like queries on Pinot and metadata service for dataset management.

As part of the talk, we cover the matured architecture for real-time analytics ecosystem powering Uber’s usecases that serve up to 10s of thousands of queries/sec, several million writes/sec and host up to tens of Petabytes of Pinot datasets.

We also cover two critical business and observability usecase.

1. Real-time processing and ingestion using AthenaX(SQL based transformation on Flink), Flink and Kafka to provide analytics on realtime data.

2. Real-time Analytics powered by Apache Pinot to serve analytics at high QPS with sub-second latency

3. Disaster resiliency and disaster recovery strategies for Apache Pinot datasets.

The talk covers Uber’s two use cases that solve real-time analytics challenges for business and observability. 1. Use case 1: Business use case(rides/eats related)

2. Use case 2: Observability usecase (metrics/logs related)

The audience will gain practical insights into designing real-time analytics systems centered around Apache Pinot and effectively leveraging complementary real-time technologies to build robust and high-performing solutions.


Rohit Yadav

Uber

Satish Duggana

Uber