What is Quix?
Quix is a platform for working with streaming data.
We architected Quix natively around a message broker (specifically Kafka) because we know databases are in the way of building low-latency applications that scale cost-effectively. Instead of working with data on a disk, developers could work with live data in-memory, if broker technologies were easier to use.
But they are not easy to use, especially for Python developers who are at the forefront of data science but cannot easily work with streaming data.
Quix provides everything a developer needs to build applications with streaming data. By using Quix you can build new products faster whilst keeping your data in-memory, helping to achieve lower latencies and lower operating costs.
From the top-down, our stack provides a web UI, API’s and SDK that abstract developers off our underlying infrastructure, including fully-managed Kafka topics, serverless compute environment and a metadata-driven catalogue.
With the Quix Portal we are striving to make a beautiful software experience that facilitates DevOps/MLOps best-practices for less-experienced development teams. Our goals are to:
Help less expert people access live data
Help them create and manage complex infrastructure and write application code without support from expert engineering teams, and
Help to accelerate the development lifecycle by enabling developers to test and iterate code in an always-live environment.
We have provided four API’s to help you work with streaming data. These include:
Stream Writer API: helps you send any data to a Kafka topic in Quix using HTTP. This API handles encryption, serialisation and conversion to the Quix SDK format ensuring efficiency and performance of down-stream processing regardless of the data source.
Stream Reader API: helps you push live data from a Quix topic to your application ensuring super low latency by avoiding any disk operations.
Catalogue API: lets you query historic data streams in the catalogue to train ML models, build dashboards and export data to other systems.
Portal API: lets you automate Portal tasks like creating workspaces, topics and deployments.
Python is the dominant language for machine learning, but it is quite incompatible with streaming technologies (like Kafka) which are predominantly written in Java and Scala.
Our streaming SDK is a client library that abstracts Python developers off streaming-centric complexities like learning Java or dealing with buffering, serialisation and encryption.
Instead, our SDK serves you streaming data in a data frame so you can write any simple or complex data processing logic and connect it directly to the broker. There are just a few key streaming concepts that you must learn. You can read about them here.
Our deployment environment is designed to completely remove the concept of a cluster from the lexicon of cloud computing.
Firstly: clusters are rather expensive which puts up a barrier to adoption. For example, what if you want to run a simple model that listens for intermittent steams of data and only consumes 400 millicores of CPU and 500MB of memory when the data arrives? You’ll pay the full price of a 4 core/8GB of RAM cluster for as long as the model is deployed!
Secondly: Scaling up beyond one cluster starts to get very complicated and expensive, very quickly, which is going to give you nightmares when your product grows.
Finally: the idea of provisioning compute resources is anathema to software development, could you imagine provisioning your CPU and memory before running some code in your local IDE – it’s just ridiculous, and should be in the cloud too.
Instead of provisioning clusters, with Quix you just deploy your code with the click of a button by setting the maximum resources that it requires, and we handle everything else:
If your model receives no data, and thus doesn’t consume any resources, you won’t pay anything. When that intermittent data stream comes along, your model will spool up, process the data and spool down again.
If your model fails, a new one will be created that will start processing data where the last one stopped.
If your intermittent stream becomes a torrent, you can increase the number of replicas to share the load.
Learn more about our serverless compute here.
Quix provides fully managed Kafka topics which are used to stream data and build data processing pipelines by daisy-chaining models together.
Our topics are multi-tenant which means you don’t have to build and maintain an entire cluster to stream a few bytes of data. Instead, you can start quickly and cheaply by creating one topic for your application and only pay for the resources consumed when streaming that data. When your solution grows in data volume or complexity you can just add more topics without concern for the underlying infrastructure which is handled by us.
Together with our SDK and serverless compute, you can connect your models directly to our topics to read and write data using the pub/sub pattern. This keeps the data in-memory to deliver low-latency and cost effective stream processing capabilities.
We provide a data catalogue for long-term storage, analytics and data science activities.
Using Gartner’s definition: ‘A data catalogue maintains an inventory of data assets through the discovery, description, and organization of datasets. The catalogue provides context to enable data analysts, data scientists, data stewards, and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.’
We have combined what we know to be the best database technologies for each data type into a unified catalogue. There’s a timeseries database for recording your events and parameter values, blob storage for your binary data, and a NoSQL DB for recording your metadata.
Our data catalogue technology has two advantages:
It allocates each data type to the optimal database technology for that type. This increases read/write and query performance which reduces operating costs.
It uses your metadata to record your context. This makes your data more usable for more people across your organisation who only need to know your business context to navigate vast quantities of data.
Quix permanently stores your data streams in exactly the same condition as when they were streamed live. Our metadata-driven catalogue records the semantics of your streams so that anyone in your organisation can quickly understand them, explore them, and use them in their applications and analysis.
The always-live nature of your streams makes off-line model training simple, and helps to ensure that your model is developed in a representative environment whilst enabling data scientists to adopt simulation methods to back-test models against unseen live data before connecting them to live data in production topics.
The net result is a faster development cycle, with a better-quality model at the end.