Automating analytics with a stream processing platform

by Peter Nagy
| 9 Nov, 2021

Here at Quix, we use … Quix!

Contents:

 

Let’s not call it “eating our own dog food,” but rather, “drinking our own champagne.” We are so confident in the productivity and scalability of the Quix platform that we are using it to build the platform — and we are moving fast.

Stream processing enables a wide variety of use cases across industries. Any project that benefits from processing live data can use it. So one of the operational systems we’re building with Quix is a real time analytics environment that wraps user click streams and backend product event data into a singular view of each individual user.

The key business goal for this solution is that it must be scalable, extensible and user-centric. So while today it serves to produce real time Slack notifications for team motivation and customer support (like, “You just got a new user — hurrah!”) and daily funnel reports (such as daily aggregated signups, activations and active users), tomorrow it has to be useful for real time personalization use cases. This includes real time interventions, such as when a user is having technical problems, and optimizing our underlying cloud infrastructure resources.

 

How to build customer-centric, real time analytics

Stream processing is perfect for this hybrid real time and batch analytics use case. With Quix, we were able to build a stream processing pipeline for the real time features (analyzing user activity and sending Slack notifications) and simultaneously clean and store data into the catalogue for use in daily reporting and ad hoc analysis with Grafana.

 

 

This hybrid solution based on stream processing principles makes life much easier later on. First, it’s very easy to extend the real time pipeline by adding additional services for specific features, such as integrating into a customer service dashboard. Second, because the data is cleaned, prepped and organized in-flight, it’s much easier to build the daily reports, and hook up the catalogue to Grafana.

The catalogue schema now matches the production schema, so our data scientists can easily explore the model in search of deeper insights like behavioral patterns — without wasting time cleaning and sorting data.

A huge part of this capability comes from the Streaming Context feature in the Quix SDK. Let’s take a look.

 

Wrapping data into a customer journey stream

Quix provides context that lets us wrap data from different sources into a singular stream, with a defined start and a defined end.

We wanted to use this Streaming Context feature to deliver customer centricity by wrapping all data from each user journey into an individual stream. In this case, click stream data from the frontend is integrated with event stream and logs from our Kafka backend.

 

 

Why? Over time, this lets us build a deep understanding of how individuals use our product, and allows us to build real time personalization features to improve the user experience.

 

Streaming data from different sources

We wrote some custom Javascript using our Streaming Writer API to stream data from the frontend to a topic in Quix. The schema includes information on the browser like height and width dimensions, and user interactions including button clicks, page URL and visibility changes.

We then wrote a Kafka connector to stream data from our backend service bus (Kafka) to another topic in Quix. This schema contains the information from our backend services like API requests and responses, error messages and error codes.

The data are written using our parameterData and eventData formats, which result in a topic schema that is very easy to understand and work with. Additionally, all data streamed is stored in this same schema, so any employee can enter our Portal Analytics workspace to review the raw data:

 

 

As you can see, the raw data is quite difficult to interpret, so we need some processing and automation to create useful information. Let’s do that next.

 

User Journey Mapping

Now that we have streaming data from the frontend and backend, we need to wrap it into our business context — one customer journey. We wrote a Python service to translate individual parameters and events into a stream, one for each user.

Individual requests, or a series of specific requests, might translate into higher-level actions, such as:

 

 

Here you can see how user journey mapping lets us view how a new user onboarded to our product. You’ll see timestamps in the first column and their time-ordered events (authenticated, project created, deployment started etc) in subsequent columns.

With this in place, we can now build event-driven automations. For example, we can send notifications to Slack when a new user signs up. If we see them having technical problems, we can intervene to give them a better user experience, in real time, while they are still using the product.

 

Building real time, event-driven alerts

The Slack notification is a very simple service that listens for the events in the customer journey stream (like Deployment Started, and Deployment Completed) and sends notifications to Slack if the logic detects a problem, such as a longer-than-expected deployment. It also reads an error message event, like a deployment failed.

Our implementation of this is basic at the moment, but my favorite alert is the Signup bot. It really motivates the team to know that a new developer has seen potential in our project:

 

 

We see a lot of potential in real time alerting to help us build an amazing customer experience (more on that below). For now though, our stream processing pipeline also serves to automate our daily reports and ad hoc analytics.

 

The advantages of using stream processing for data analytics

The big advantage of taking a stream processing approach to data analytics is that your data is cleaned and processed into a user-friendly business context before it lands in storage.

Read that again.

 

“The big advantage of taking a stream processing approach to data analytics is that your data is cleaned and processed into a user-friendly business context before it lands in storage.”

 

It sounds trivial, but processing the data before it sinks into a data lake is essential to using data successfully. The current “dump it and transform it” approach in the data industry costs data engineering teams 80% of their time. This has to change.

At McLaren, we would never dump data and transform it later. The race would be lost by the time someone got around to analyzing what the sensors were streaming. Plus, the waste of time in dump-transform-analyze is unthinkable in a high-performance environment.

Instead, we always go upstream to correct data quality issues at their source, working to guarantee quality data-in-flight — before it lands in storage.

 

Automating daily reports

With the data stored neatly, it’s easy to build an automated daily report.

The goal of our first report is to track the aggregated number of new users who have progressed (or not) through the onboarding funnel. We then send that report in PDF to a Slack channel called #portal-analytics.

The funnel looks like this:

  • # new sign-ups
  • # activated users (users who have completed a meaningful activity in the product)
  • # active users (users who continue to complete meaningful activities)
  • # inactive users (previously active users who have not been active in the past 15 days)

The immediate advantages of having these daily reports are:

  • Specific action can be taken to address individual customer needs
  • Wider dissemination of this data in a simple format improves understanding across the team. For example, if users are consistently stuck at one part of the process, we can develop better docs or content to support them.

 

Building the automated report

The report was built using a simple Python service deployed in Quix. Unlike the real time pipeline, this service is not connected to the broker. Instead, it works as a standalone service deployment to:

  • Query the data catalogue to calculate the metrics for each stage in the funnel
  • Write the query results into a new stream called Users by Activation Stage
  • Query the new stream to generate a PDF report for each stage in the funnel
  • Send a report for each stage in the funnel to Slack

Then it waits for 24 hours – (last time to first time) and starts the process again.

 

 

You can see in the Slack channel above that each stage in the funnel is represented by its own message, each one having a number of replies which is an individual report for each user journey stream.

Before this was in place, there was really no accurate way to gauge how many users were in each state, and whether their onboarding journeys faltered at a certain point.

Now we can see exactly what’s happening every single day and target improvements to our product and onboarding journey to optimize conversion — we even query all of this data in Grafana where we can adjust the range from one day to all-time.

 

“These reports are now totally automated, so maintenance is essentially zero. And we can extend this to more use cases as we grow.”

 

Additionally, these reports are now totally automated, so maintenance is essentially zero. And, with just a small amount of work, we can extend this to more use cases as we grow.

 

Taking advantage of stream processing analytics

Although these daily reports are very useful for us, there are better tools for product analytics (check out PostHog — we love it!). So you might be thinking: why bother building analytics on a stream processing platform?

It’s the future possibilities that really excite us … remember the solution requirement was to be scalable, extensible and user-centric?

Now that we’re collecting data from different sources into a customer journey stream, and have built it on a real time pipeline, we can start to act on that data proactively and instantly.

 

“Our stream-processing-first solution was easy to build — it was developed in a matter of hours. But it has opened up a huge number of potential applications.”

 

Some of the things we’ll build as we scale include:

  • Automating and customizing onboarding emails specifically for users (“Hey, we saw you’re streaming data from a car. Check out our forecasting model.”)
  • Popping up a dialog to assist users if we recognise they are stuck
  • Sending a tech support message if we see an error code
  • Automatically sending emails to customers to remind them of an upcoming trial expiry. (Quix’s free trial currently doesn’t expire, doesn’t require a credit card, and comes with $240 per year in free Quix credit to get your PoC running.)
  • Recommending when users should train ML models based on their deployment patterns
  • Customizing UX based on users behavioral patterns (“Looks like you are a data scientist. Have you tried turbo mode?”)
  • Presenting rewards to deeply engaged users
  • Scheduling and automating our infrastructure based on demand for resources

In practice, our stream-processing-first solution has opened up a huge number of potential applications. And it was easy to build. In fact, this solution was developed in just a matter of hours.

The real power behind this use case is the Quix Platform, which helps organizations to analyze and act on data in real time, bring products to market faster, and avoid the massive costs of data infrastructure.

Ready to try it for yourself? Here’s the link to your free trial.

by Peter Nagy

Peter Nagy is Head of Platform at Quix, where he leads the infrastructure team to ensure security and stability while building out Quix's features. He was previously a software engineer at McLaren, implementing improved remote stream processing for Formula One car data, and helped build their internal platform. Peter has been coding since childhood.

Related content