Data

Introduction

All data persisted through topics are stored in the Quix Catalogue as streams.

The Quix Catalogue

The catalogue is a unified data store consisting of time-series and document database technologies merged to work in perfect harmony. Each workspace consists of individual instances of each database technology which have been tightly integrated with our own technologies to provide a simple yet powerful solution.

We have completely abstracted away the complexity of building, scaling and working with different database technologies so that you no longer have to think about tables, buckets file systems or blob stores, instead, you simply decide whether you want the data stored, and define the location - we do the rest.

Our data catalogue technology has two advantages:

  1. It allocates each data type to the optimal database technology for that type. This increases read/write and query performance which reduces operating costs.

  2. It uses your metadata to record your context. This makes your data more usable for more people across your organisation who only need to know your business context to navigate vast quantities of data.

Streams

Streams are the central concept of the catalogue. They unify time-series and metadata into a single object that groups all relevant information into one discreet session.

Streams make it very easy to manage, discover and work with your data; they are key to architecting excellent data governance in your organisation.

Features

The catalogue is incredibly powerful, fast and flexible. The features help organisation build rigour into their data practices.

As standard these instances run on HDD’s, but you can upgrade them to SSD’s if you need more performance.

Security

To improve data secutiry our catalogue is not multi-tenant. Instead, each workspace is provisioned with a unique instance of each database technology which can only be accessed with it’s unique credentials. Your data is also encrypted at rest.

Indexing

We have implemented fast and efficient indexing architecture that spans both the time-series and document database technologies. Our indexing delivers rapid data retrieval so that you can:

  • Quickly navigate the catalogue to find relevant datasets.

  • Perform big data and machine learning tasks.

  • Build responsive applications using our Query API.

Again, we have taken care of the tech so you can focus on your application.

Rigorous Data Management

The catalogue and streams provide a powerful way to manage your data.

Each row in the catalogue is one stream identified by it’s name and ordered by the date created column (newest on top) by default.

Each column in the catalogue is key item of metadata which can be used to order and search streams.

Each stream is grouped and ordered in the catalogue using it’s metadata. Use our SDK’s to define data grouping and a data hierarchy that suits your needs.

Data Grouping

Data is grouped in the catalogue by stream, location, topic and metadata.

Grouping by stream: A stream is used to group time-series data (parameters) together with the events and metadata which give those parameters and streams context. Each stream is one row in the catalogue with the most recent stream on top by default.

Grouping by location: The catalogue includes a navigation pane in which you can quickly find streams by navigating through the data hierarchy to find the data location.

Grouping by topic: Streams are automatically grouped by topic as another means to quickly find what you are looking for. Searching for streams by topic will return only those streams that have some data parameters contained in that topic.

Grouping by metadata: Streams are also grouped in the catalogue by some of their metadata, including: stream name, stream start and end, stream status, stream topic and stream creation date/time. Each item of metadata is one column in the catalogue which can be ordered in the UX.

Data Location

The data location allows you to define the location of your streams within the catalogue. For some applications the location may be tied to the physical location of the data source, for others the content could be product related; for example:

  • Racing teams may want to organise their data into a location based on the location of their races such as Race Series > Season > Country > Circuit.

  • App developers may want to organise their data based on the type of device used, such as: App Name > Version > Platform; or the physical location of their users, such as: App Name > Region > Country > Town.

  • A MedTech company might want to organise IoT data from wearables by patient, pathology or anatomy.

The data location is an extremely flexible feature which can be tailored to your needs using Locations Properties in our SDK.

Data Governance

Streams are key to good data governance. Use them to organise your data in the catalogue by:

  • Creating a data hierarchy to group incoming data by session, location or other feature.

  • Logging separate or continuous sessions depending upon the use case.

Flexibility

A stream is very flexible:

  • It can be never-ending, such as a stream of data from a power station, or

  • It can begin and end with with the start and finish of a session, such as a football match, or

  • It can be a continuous stream of batches, such as daily stock market prices concatenated at the daily market open/close.

The catalogue is also very flexible. You define how to set-up the management of your data according to the needs of your organisation, project or product by customising the data location and metadata using our SDK.

Data Discovery

It is very easy for any user to find data in the catalogue using our UX and navigating by location or topic, or by searching and filtering by column.

All metadata for a stream is appended to it in the streams table and can be quickly accessed by clicking on the ‘open’ arrows.

Finally, any data in the catalogue can be quickly visualised in Quix or external applications such as PowerBI or Grafana to further improve discovery.

Working with data

SDK’s

Most of the work involved in setting up your catalogue is done programatically using our SDK. See our samples for a range of example use cases.

Stream Status

Each stream is automatically tagged as Open when first received and will remain so until a stream end is received.

Open: The stream is live and data is being persisted.

Closed/Aborted/Terminated: The stream is now historic and no additional data is to be expected. It is up to the sender to determine which stream end should be used. Quix currently does not make distiction between them. An example use case of 'Terminated' could be sender shutting down before sending all stream data.

Interrupted: This status is only available when a stream is being persisted. When an Open stream is inactive for over 10 minutes this state is set for the stream. While this state is active no new data is being persisted for the stream. Once new data is read for the stream, it will move back to Open state. This state is also being used when there is an interruption in the persisting service.

Persistence

There are many use cases where you may want to reduce the amount of data persisted such as when pre-processing or downsampling data. In such circumstances we strongly suggest creating topics to manage persistence of the raw and processed data.