Background

Lightning Talk

Robinhood’s Flink Deployment Practices

Robinhood leverages Flink for many of the stream processing applications that power the platform’s features, including shareholder’s positions, fraud detection, data ingestion, etc. With Kafka involved in almost every mission-critical step of Robinhood’s functionality, Flink is crucial for building scalable, maintainable and reliable stream processing applications.

  Engineers at Robinhood deploy their Flink applications continuously. To achieve continuous deployments, the Streaming Platform team has done the following:  - For Apache flink-kubernetes-operator to reconcile 100+ deployments concurrently and not crash, operator configurations had to be tuned.  - A health check mechanism in Robinhood’s deployment system auto-recovers failed/bad deployments by deploying the previous healthy version of the application.  - For manual recovery of a Flink application, where the deployment needs to be deleted, Robinhood’s in-house Checkpoint-Fetcher automates the manual work of locating the latest checkpoint in storage and updating the initialSavepointPath field in the kubernetes manifest. This eliminates human error that can occur when deploying the Flink application after its deletion.

  If your organization currently runs Flink on kubernetes and wants to continuously deploy Flink applications, please attend this talk to see how Robinhood’s deployment practices can be applied to your Flink deployments.

Tony Chen

Robinhood Markets, Inc.