Linkedin

LinkedIn is the world’s largest professional network, connecting professionals to make them more productive and successful through jobs, learning, and community.

Enterprise-Scale and Contribution
2025
Enterprise-Scale and Contribution
2025
Enterprise-Scale and Contribution
2025
Enterprise-Scale and Contribution
2025

Today, the platform powers thousands of mission-critical pipelines across LinkedIn’s products

LinkedIn is a two time Data Streaming Awards winner! See their previous winner case study from 2022.

Data Streaming Technology Used:
  • Apache Flink®
  • Apache Kafka®
  • Kubernetes
  • Ambry (LinkedIn’s source-of-truth distributed blob storage system)

What problem were they looking to solve with Data Streaming Technology?

LinkedIn’s products rely on real-time signals such as Ads AI, Feed Ranking, Abuse Detection, Notifications, Search, Jobs, and Premium. As these use cases grow, we face sharp increases in throughput, state size, and reliability expectations. Teams needed to deliver low-latency features quickly, without deep distributed-systems expertise and dedicated operational support.

We encountered three core challenges:

  1. Developer productivity: Excessive operational toil to size, tune, backfill, and validate jobs safely.
  2. State management at scale: Rapidly growing job state, long checkpoint latencies, and slow recovery for large stateful pipelines.
  3. Cost and sustainability: A legacy stack and inefficient job sizing patterns drove high hardware spend for large workloads.

Our goal was to build a self-serve, production-grade stream processing platform on Flink that abstracts away complexity, enables declarative Flink SQL, and makes large-state jobs both reliable and cost-effective across thousands of pipelines powering LinkedIn’s member experiences.

How did they solve the problem?

We built a managed stream processing platform on Flink, running on Kubernetes with split deployment, resource provisioning, auto-scaling, monitoring, alerting, and failure recovery. On top of this runtime, we delivered managed Flink SQL, enabling teams to author streaming pipelines declaratively while the platform handles execution details, removing the need for deep distributed-systems expertise.

To address state management at scale, we optimized the full lifecycle: optimizing checkpointing and recovery, and modernizing storage by migrating job state from legacy filesystems to Ambry blob storage. This migration unlocked durability, elasticity, and significant cost efficiencies for massive checkpoints and savepoints.

We also invested in tooling and intelligent automation for improving developer productivity and for safe migrations during LinkedIn’s evolution from Samza to Flink. This included ensuring reliability at scale through state compatibility, job tuning, autosizing, runtime stability, and portability through Apache Beam. We diagnosed Join operator I/O bottlenecks and developed a resource consumption formula, leading to 80% hardware cost savings. Additional platform features - such as topic-level startpoint configuration, capacity readiness checks, and automated reconciliation/backfill workflows - further improved reliability and minimized operational toil.

What was the positive outcome? 

Today, the platform powers thousands of mission-critical pipelines across LinkedIn’s products, with managed Flink SQL enabling rapid iteration by AI, infrastructure, and product teams.

From our migrations and platform upgrades, we achieved material infrastructure savings, including an 80% hardware cost reduction on targeted workloads such as join-heavy pipelines. Operational toil dropped through automated sizing, safer backfills, and end-to-end validation. The modernization of state storage to Ambry blob storage improved durability and elasticity for large checkpoints, while platform-level reliability features reduced incident risk for big-state jobs.

The net effect: faster time-to-value, lower cost, and greater reliability for real-time features that impact LinkedIn’s members and customers at global scale.

Additional links: