How to build a powerful project free with Quix

by Mike Rosam
| 24 Aug, 2021

We used Quix’s free plan to deliver a POC that processes millions of tweets

Twitter remains one of the best ways to take the pulse of the world on topics from Olympic sports to political activism. But analyzing millions of tweets consumes a lot of technical resources.

We took on the challenge to build a Twitter sentiment analysis tool to demonstrate just how much value you can get out of Quix’s platform with its free offering to developers. And in this blog, we broke down the costs in nitty-gritty detail so you can see exactly what the platform costs at scale.

 

“Our rapid development environment enables developers and data scientists to jump straight into coding without spending more than a few seconds configuring infrastructure.”

 

Quix is a complete streaming data platform that helps build real time data-driven products, faster and more efficiently than self-building infrastructure. Our rapid development environment enables developers and data scientists jump straight into coding without spending more than a few seconds configuring infrastructure. Quix includes:

  • a serverless and managed Kafka for building scalable and reusable pipelines
  • a serverless and managed compute environment for executing code
  • a managed metadata catalogue for recording data streams in your business context
  • a simple, efficient client library supporting native Python and DataFrames
  • extensive APIs to connect to your data sources and sinks
  • software to support the application development lifecycle

As a developer-first platform, we offer a free account so you can kick the tires without having to input credit card details. The free account includes 200 credits (worth $20) that renew every month. This allows you to build a solution and keep it running.

Building a solution that consumes less than $240 per year in technical resources doesn’t sound like a lot, but Quix is surprisingly efficient, with a tight integration of best-in-class infrastructure. We’re confident you’ll be able to build and run something very cool. This example streams and processes 4 million tweets every month for less than 200 credits.

 

How we built it: Streaming data analytics

We created an automated system that notifies you when the sentiment of a tweet hashtag changes. The solution includes:

  • a service that listens for Dogecoin tweets
  • an ML model that processes each tweet to create a sentiment score
  • a real time notification service that sends messages to Slack when sentiment changes
  • a web app that prints the rolling average sentiment over the last 24 hours

We chose Dogecoin because it currently has a high volume of tweets. However, you could monitor any hashtag you are interested in and build an app that acts on information in real time.

The project builds a pipeline of raw and processed data. All the data (both raw and processed) are persisted to our data catalogue so you can use them in developing iterations of your ML models.

 

“Our goal was to demonstrate how much data you can stream, process and store on Quix at low cost.”

 

It’s a simple implementation — we’re sure you can do better — but our goal was to demonstrate how much data you can stream, process and store on Quix at low cost, due to our tightly integrated infrastructure, rather than demonstrating the quality of the services, ML model, or app.

 

Real time streaming data architecture

The architecture (figure 1) consists of a stream processing pipeline built using two topics and three deployments, and a web frontend built using data persisted to the data catalogue.

 

The solution contains 2 topics with persistence enabled, and 4 deployments

Figure 1. The solution contains 2 topics with persistence enabled, and 4 deployments

 

Streaming data cost management

Now that you’ve got a sense of the solution, let’s dive into each element’s configurations and costs. Quix offers a fully transparent, controllable and flexible cost model that ensures you only pay for what you use, and helps you use resources as efficiently as possible.

Perhaps the most important cost comparison is the Quix platform versus a DIY solution, or a collection of services (cloud compute, managed Kafka, storage, etc.). The expensive development time needed to set up infrastructure or configure multiple services to work together quickly runs up a project’s cost.

Our example project would cost 259.54 credits per month to run in production. In figure 2, we’ve broken down each major component into costs in credits and dollars per million messages.

 

Solution breakdown by cost category

Figure 2. Solution breakdown by cost category

 

As you can see, it costs 2.78 credits (just under 28 cents) to stream 1 million messages, 61.65 credits ($6.17) to process 1 million messages across 4 deployments, and 0.42 credits (4 cents) to store 1 million messages.

Storage is particularly cheap here because there is a fixed cost of 71 credits ($7.10) per month for a workspace, which is required, but that includes 8GB HDD, so you’re only paying for the read, write and query costs. Beyond the workspace cost, you have 129 credits remaining to stream and process data.

Quix workspaces group all the components of one project (topics, projects, deployments and data) into one environment that is secured with a unique key. The workspace includes essential resources for operating your data infrastructure.

 

“Everything on Quix is usage based, and Quix lets you monitor all costs in real time so you can make the most of your resources.”

 

Like everything on Quix, workspaces are usage based, so if you only create a workspace for a day, then you’ll only pay for one day (about 24 cents). Quix lets you monitor all costs in real time so you can make the most of your resources and avoid costly surprises.

Now, let’s take a closer look at how the platform components work together.

 

Organizing and streaming data topics

At the heart of the solution is the Quix managed Kafka message broker, which is serverless, so you can create topics in a few clicks without provisioning any clusters. Topics let you stream data and build data pipelines.

Quix costs 5.9 credits per month for each topic, and 2.6 credits to stream 1GB of data in that topic (together, that’s 85 cents). Charges are fractional down to the millisecond and byte of data, so if you only create a topic for a minute, then you’ll only pay a fraction.

We created two topics (Figure 3) for this solution:

  1. An input topic (sentimentanalysis-twitter-stream) for streaming raw data from Twitter
  2. An output topic (sentimentanalysis-sentimentstats) for streaming the results of the SentimentAnalysis ML model

Both topics are persisted, which means that every message streamed on each topic is written to our data catalogue, together with its metadata. You can change this storage option with a toggle switch to further control data storage costs, which we’ll cover later.

 

Topics deployed and persisted

Figure 3. Topics deployed and persisted

 

Efficiently streaming to conserve data use

This example streams 2.75 million tweets into the input topic. Every tweet is processed by the ML model in a pub/sub pattern by reading raw data from the input topic, processing it, and writing results to the output topic. This means the output topic is also streaming 2.75 million tweets for a total of 5.5 million tweets streamed across the solution.

Each tweet is streamed using one ParameterData message in this SDK. This is very efficient, so 5.5 million tweets require only 1.27GB of data. The charge for 2 topics and 5.5 million messages would be roughly 15.36 credits, or $1.54:

 

Monthly charges for streaming 5.5 million messages using ParameterData format

Figure 4. Monthly charges for streaming 5.5 million messages using ParameterData format

 

Data processing on elastic resources

The solution has four deployments:

  • a Twitter connector (TwitterData)
  • an ML model (SentimentAnalysis)
  • a notification service (SlackAlerting); and
  • a web UI (dashboard)

Quix only charges for the exact CPU and memory resources consumed in your application. Figure 5 shows all deployments in action. As you can see, the CPU and memory are elastic resources, which are billed per millicore/millisecond and Mb/millisecond, respectively.

 

Figure 5. Deployments are elastic resources charged on a usage-based model

 

Let’s take a closer look at each deployment:

Twitter connector

This is a simple service that connects to the Twitter API, gets each tweet with #doge, converts it to the Quix SDK ParameterData format, and streams it to the input topic. Each record contains the tweet text, tweet ID and a tag value (which is the Twitter search term #doge).

Sentiment analysis ML model

The model is pre-trained on historic data. It reads data from the input topic, does its magic, and writes a sentiment score for each tweet to the output topic, in real time.

Slack notification service

This service processes the results of the ML model to calculate when the score varies by a configured percentage. It sends a message to your Slack channel when that threshold is met.

Web UI

This is a Quix deployment with a public DNS. It uses the Catalogue API to plot a rolling average sentiment over the last 24 hours on page load. You can also build real time web apps using the Streaming Reader API, but we chose to demonstrate using the Data Catalogue.

 

Processing costs for compute and storage

As mentioned, the data processing charges are completely elastic. Figure 5 shows a snapshot in time, and as the load goes up or down, so will your costs — you only pay when you are getting value.

In the example, 2.75 million tweets are processed by each deployment. In total, the project would consume 74.71 core hours of CPU and 643.59 GB hours of memory per month. In that case, the charge for the four deployments would be 170.39 credits ($17.04) with the CPU/memory split highlighted in figure 6.

 

Monthly compute charges split by CPU and memory

Figure 6. Monthly compute charges split by CPU and memory

 

Persisting and storing data from a stream

Quix provides a data catalogue to store your streams of data for later use in the model development lifecycle. Simply turn on persistence to permanently store data streamed in each topic (see the right-hand column’s toggle switch in figure 3).

With persistence on, Quix writes each message into the optimal storage technology for that data type and wraps it in a stream so you can still make sense of the data. Persistence is enabled for both topics in this example, so the catalogue has two streams (figure 7), one for each topic.

 

Figure 7. Two streams of persisted data in the catalogue

 

Optimizing data storage to improve efficiency

Data storage is very efficient in Quix because we’ve optimized each data type and tightly integrated Kafka and the data catalogue using the SDK.

In this example, we store 2.03GB of data per month (figure 8). It costs 2.18 credits (22 cents) to write 0.52GB of data to the catalogue, zero credits to query data in the catalogue (because the web UI is making such a small query), and 0.13 credits (1 cent) to read data from the catalogue (again, a tiny amount, because the web UI is using a very small amount of data).

 

Breakdown of storage charges

Figure 8. Breakdown of storage charges 

 

In this example, catalogue usage costs nothing because it consumes only 2.03GB of disk space, while the free tier workspace includes 8GB of HDD capacity. It costs 7.45 credits per month for each additional GB of data if you exceed the 8GB allowance.

Quix helps you control storage costs by letting you:

  • choose not to persist data
  • choose which data to persist on a topic-by-topic basis
  • choose which data to persist in your model with down sampling; and
  • delete historic streams that are no longer required.

 

From idea to POC: What will you build with Quix?

You can build any data application or data processing pipeline in Quix. Combine topics and deployments in a solution architecture that solves your problems — just bring your code and domain expertise.

 

“Combine topics and deployments in a solution architecture that solves your problems — just bring your code and domain expertise.”

 

Quix’s developer-first platform is designed for speed and efficiency, offering managed infrastructure that lets you focus immediately on building your project, and powerful, tightly integrated technology to make the most of your resources.

We should also add that when you build on Quix, your code is always yours, your data is always yours, and we protect you with encryption, authorization, and authentication. It’s production-ready infrastructure designed to bring your projects to life faster.

Our example project took just hours to build because the underlying infrastructure was already set up. It delivered an effective and efficient project to stream, process, and store millions of tweets — free.

Ready to experiment with your own project? Sign up now and get $20 per month in free Quix credits. We’d also love to hear more about what you build on our community Discord channel.

by Mike Rosam

Mike Rosam is cofounder and CEO at Quix, where he works at the intersection of business and technology to pioneer the world's first streaming data development platform. He was previously Head of Innovation at McLaren Applied, where he led the data analytics product line. Mike has a degree in Mechanical Engineering and an MBA from Imperial College London.

Related content