Background

Breakout Session

From 📎 to 🧠: Building Gen AI Apps & Copilots on Streaming Data with Flink SQL

Gen AI apps such as intelligent business-specific Copilots can serve as a vital link between foundational models and data streaming, enhancing developer and employee productivity. Enabling users to ask questions about streaming data in natural language can provide business and engineering teams with dynamic, real-time access to organizational knowledge, enhancing employee productivity and improving workflows across enterprise operations.  

  For structured data in Kafka topics, we have found the accuracy of LLMs to dynamically generate correct Flink SQL being greatly improved by establishing strong data contracts and evolving schemas through generating annotations and metadata with the help of LLMs. Providing the LLM with full context about the latest schema of Kafka topics and Flink tables proved to be crucial for the robustness of generated Flink SQL statements.  

  For semi-structured and unstructured data (like text, JSON, and binary files), continuously generating embeddings and storing them in vector databases for retrieval augmented generation (RAG) can serve as a powerful knowledge resource. Enterprises today only analyze and use 0.5% of unstructured data.  

  We will demonstrate in a concrete step-by-step example how to build and deploy a Copilot in TypeScript/JavaScript with open-source tools, integrated with Apache Kafka for event streaming, Apache Flink for stream processing, OpenAI/Mistral for model inference and vector stores. In addition to custom Copilot UIs, we will also cover the deployment and monitoring of Copilots across various internal tools, such as Slack, Microsoft 365 Copilot, and OpenAI GPTs.

Steffen Hoellinger

Airy, Inc.