Embedding Tiny Language Models in Flink SQL functions

Breakout Session

Learn how to embed a tiny language model inside your Flink SQL pipeline – turn messy and free-form text from your events into structured and actionable fields in real time, all within your cluster.

You’ve likely seen AI in event streams being done by calling hosted cloud large language models over HTTP. That is the right option for many scenarios, but isn’t practical for every use case. Maybe your data needs to stay in-cluster, your Flink job is running somewhere without public cloud access, or you just need a predictable per-event cost. For situations like these, you could co-locate a tiny language model in your Flink job.

In this session, I’ll go through some use cases where this approach is most useful (and make it clear where it isn’t sensible!)

I’ll walk through practical steps for how to make open source and freely available models accessible from Flink SQL as custom functions – including how to choose a CPU-friendly model, considerations for prompts that are effective with tiny models, and the observability needed to ensure your solution is viable for the event stream throughput.

Dale Lane

IBM