The (surprisingly) long history of streaming real-time data

by Kiersten Thamm
| May 6, 2022

Or, why Stream processing is essential to human existence

We often talk about stream processing as a cutting-edge system — but is it? This post is likely the shortest presentation of the surprisingly long history of stream processing to explain why that history is imperative to the future of technology.

 

Stream processing in the twenty-first century

Let’s start with what’s most familiar: Stream processing in the twenty-first century. These events mark a few of the most significant occasions in recent stream processing.

  • 2022: The Stream Community started. The welcoming, non-commercial group of developers, engineers and scientists began helping each other figure out the technology and implementation of stream processing in contemporary dashboards and applications.
  • 2011: Apache Kafka is open sourced. The distributed event store and stream processing platform expanded its users from LinkedIn employees to anyone on the internet. Because it’s a unified, high-throughput, low-latency platform, it’s often the base infrastructure for streaming projects.
  • 2008: “Millwheel: Fault-Tolerant Stream Processing at Internet Scale” is published. Employees working at Google released a paper about the framework for building low-latency data processing applications.
  • 2002: “Models and Issues in Data Stream Systems” is published. The paper from researchers at Stanford defined a new model of “data processing that does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams.”

These milestones mark the development and use of stream processing by tech companies and associated research departments. But this isn’t the beginning of stream processing.

To access earlier instances of the technology — a history I argue is more than 52,000 years old — we need to first agree on what we mean by stream processing.

 

What is stream processing?

Companies and individuals have offered definitions of steam processing that range in size from one sentence to one book.

I propose a simple definition. Stream processing is a system that ingests at minimum one high-frequency flow of data, transforms that data in some way as it arrives, and delivers it to a destination that either acts on it immediately or stores it in a warehouse for later use.

 

 

Let’s see if that definition holds up by looking at a diagram of a non-controversial project that all can agree is an example of stream processing. (Please do let me know if you disagree! 😊)

This non-controversial project is a Python service that monitors and analyzes the sentiment of messages sent to a chat application.

 

 

The process begins when someone sends a message to the chat application. The first transformation built on a HuggingFace model checks the message for abusive language. If it includes inappropriate words or phrases, the service sends an alert to the writer’s phone, which lets them know that their message won’t appear in the chat box due to harmful content.

Messages that aren’t abusive go through the sentiment analysis service and appear in the chat box with markers indicating whether they are positive or negative.

Each message is processed with a low-enough latency to keep the conversation going. (There’s nothing worse than having regular five-minute breaks between texts during a conversation, especially when it’s important or you can’t find the television remote!)

It’s possible to expand this simple architecture into a more complex example of stream processing if we apply the framework of source, transformation and destination.

 

 

If we changed the source from chat messages to tweets containing specific hashtags, we could chart public sentiment toward cryptocurrency or particular stocks. We could change the transformations to add a filter for blue check marks or a specific language. Rather than a chat box, our destination could be a database that powers an automated trading app.

But no matter how complex our example becomes, regardless of how many data streams, nodes, consumers, producers, or clusters are involved, it could always be broken down into source, transformation, and destination.

 

Stream processing before 50,000 BCE

 

Although humans communicate in many ways — such as body language, gestures, and facial expressions — let’s focus on auditory communication. And even more specific, verbal communication.

Somewhere between two million years ago — the beginning of the human genus — and 52,000 years ago, humans began speaking to one another. Archaeologists and biologists haven’t yet agreed on the specific point within this long period. Still, saying stream processing has been around for more than 52,000 years is quite a claim.

 

 

Source: The person on the left is the source that contains the data in its original state: ideas. Those ideas form a continual source of data that just keeps going and going.

Transformation: The speaker is busy turning ideas into a speech so that other humans can understand them. Although we could have a long conversation about how to map data conventions onto speech, I’ll offer one proposal. We encode ideas into a specific language, apply that language’s grammar as protocol, and deliver it through the medium of soundwaves in the air.

Destination: The destination is the person or people who receive those encoded ideas as they’re produced and decode them back into thoughts. We could also go to a deeper level and discuss networks and synapses.

 

Stream processing is human

 

Most human activities involve real-time stream processing. Hearing, seeing, touching, and moving rely on our understanding of the most up-to-date data, processing it as it comes, and reacting to it — even if our bodies run these systems without our conscious awareness.

Imagine going through an entire day with your eyes closed and taking a photo every five minutes to look over and analyze at the end of the day. How would it go if you relied on historical data to cross a street? Do you drink outdated milk based on how it smelled two days ago?

This history of stream processing boils down to the fact that stream processing is human. It’s not as foreign or confusing as it might seem. Next time you sit down for a delicious holiday dinner and your cousin asks you to explain stream processing between bites of bread, please use my analogy. You don’t even need to credit me. 😉

Even within the tech sector, stream processing has a reputation for overwhelming practitioners. At Quix, we don’t think it needs to be. Instead, it’s a human concept that the right stack of integrated tools can address. Microservices that address your sources, transformations and destinations help you and your team navigate the process of building stream processing.

Challenges still may be significant — but not so scary.

 

Stream processing enables future technology to operate at the speed of humans

This history of stream processing also explains why our future technology needs stream processing to support human societies adequately. We’re bringing more data-driven applications into our public and private lives, and, for those products to keep up with us, they need to operate at our speed.

Join The Stream community, where you’ll find developers, engineers and scientists supporting each other while working on streaming projects.

by Kiersten Thamm

Dr. Kiersten Thamm works as the head of technical content at Quix. She directs the technical content strategy across the company by planning, writing and editing dev docs, tutorials and conference presentations. She also manages projects for the developer relations team and helps grow the Quix technical community.

Related content