The GitHub for Streaming Data: Unlocking Open Data Streams

Lightning Talk

I am a developer. My day job is at IoT company Device Insight (we are known in the Kafka community for our open source tool kafkactl). On nights and weekends, I am working on my passion project, the Grand Central Message Broker (gcmb.io).

This is a platform that is based on the following idea: Provide and consume streaming data in a collaborative manner. I made the following observation: In development, source code repositories used to be silos, only used in the context of a single company or project. With the advent of GitHub, things changed: There was now a space where collaboration could happen, code could be made available, searched for, re-used, developed together across individuals and organizations.

You could argue that the streaming world is in a place where software development used to be. Data is created, handled and processed in silos. Which is fine, a lot of data is private to organizations and should not be shared.

There is, however, streaming data that can be useful for a wider audience. This is primarily in the realm of Open Data. There is a lot of this around the world, however, most of it is in static datasets. My vision is to make Open Data available in a streaming manner.

For this reason, I am building gcmb.io, a platform where you can easily share streams of Open Data and consume those provided by others. This makes it easy to combine different types of information and use them for data science or in applications.

Examples for such data streams freely available on gcmb.io:

17 million airplane positions per day (ADS-B) from around the world

A stream of Wikipedia edits (400k per day)

Current energy data from various countries (energy production, consumption)

Medium blog posts as they are published

If you want to check out the project, it's live at https://gcmb.io. There you can find a list of featured projects (including the ones mentioned above)

If given the chance to present, I would like to explain the general concept and how the data can be ingested into Kafka and Confluent Cloud (did I mention that gcmb .io has native Kafka integration?)


Stefan Hudelmaier

Device Insight GmBH