Breakout Session

Streamlining Entry into Streaming Analytics with JupyterHub and Apache Flink

< All 2024 Sessions

The entry barrier into the world of streaming analytics often proves daunting, particularly for data scientists and machine learning engineers. The necessity of orchestrating multiple distributed systems can be a significant challenge. This obstacle is further compounded by the need to configure non-domain tools on local machines or Kubernetes clusters, before even initiating the development of streaming pipelines.

The novel solution leveraging JupyterHub as a gateway to a pre-configured environment, equipped with all necessary integrations substantially reduces the entry barrier for interactive streaming analytics. We will present how data scientists and machine learning engineers can leverage the capabilities of JupyterHub, PyFlink, and FlinkSQL. This combination provides them with intuitive abstractions, enabling the construction and deployment of pipelines directly within JupyterHub. These pipelines can then be seamlessly executed in a remote Kubernetes cluster environment.

The results have been compelling, with users reporting marked improvements in productivity, enabling the creation of streaming pipelines in significantly shorter timeframes and at larger scales.

Elkhan Dadashov

Apple

Download