Home — Blog — How To Choose a Time Series Database?

How To Choose a Time Series Database?

Big Data

21.08.2023

What is time-series data?

With ever-growing amounts and variety of data, any data analysis certainly includes the date-time axis.

How many products do I sell today compared to the day before?
What is the trend of search terms in Google Ads?
How has the bitcoin price changed over the past year?

All those types of questions imply time-series analysis. What’s the logical difference from a normal database? Well, it’s a matter of priorities and usage patterns.

For example, a regular database of employees probably doesn’t require much time-series analysis. It definitely won’t be the main type of request.

On the other hand, if we want to analyze sensor data from IoT (the Internet of Things) devices, its key metric—besides the sensor name or location—is a datetime. It makes time-series analysis the main usage scenario for IoT data.

To embark on your journey to efficient time series data management, it’s essential to make an informed choice. With the insights presented in this article, you’re well-equipped to navigate the realm of options and pinpoint the perfect solution for your needs. Therefore, we will look at a list of some of the most popular time series databases.

What are some examples of time-series data?

As we previously mentioned, time-series data is a special type of data that changes over time. The most obvious examples of such data are the following:

Stock prices
Monthly website traffic
City population
Network speed
Amount of goods at a warehouse

All types of time-series could have different fractions like per second, per minute, or even per year. But still, it changes, and businesses can take advantage of it.

Why do I need a high-performance time series database?

And do I even need it if I want to analyze anything from the datetime perspective? Of course, you don’t. Any relational database can analyze data against datetime, while small amounts of data or data with no high insert rate need no time-series specific DB.

To put it simply, you need to consider a time-series DB when you aim to consume and analyze a huge amount of time-series data—both in terms of the insert rate and absolute value.

If I put the purpose of a time-series database in a single sentence, it would be something like this:

‘An ability to consume, store, and ensure stable query performance with huge amounts of data’

By saying ‘huge’, I mean billions of records or thousands of inserts per second.

But how can we achieve that? Normally, time-series data analysis happens on various granularity levels, e.g. we want to see price changes every minute for the past hour, every hour for the past week and every day for the past month.

Those types of aggregations are pretty common and can dramatically reduce the amount of storage required for your data. At the same time, we still keep all data received so far, even though it’s at the cost of losing precision.

It gives us a second (optional) quality of time-series databases:

‘An ability to automatically manage data aggregations’

Also, it includes any data retention policies (data that gets deleted automatically after it expires).

It’s optional because it can be easily achieved using a custom ETL process, but many DB vendors support it out of the box, though.

Learn more about “What ETL is and what steps ETL has?“

To sum it up, let’s see how time-series databases help handle big amounts of data:

On a logical level, it’s aggregations and automatic retention policies (if it applies to your data).
On a physical level, it’s all about storage optimization. Depending on your database, it could be sharding strategies, partitioning, clustering, compression, etc. Many column-stored databases claim good support of time-series because they can support sorted or clustered storage.
On a computational level, it could be also some custom functions that are not part of the SQL standard by default (like LSOF in TimescaleDB).

A couple of use cases:

IoT. Sensor data could be usable in many areas such as telco (with metrics from cells), healthcare, production lines, etc. Collections of IoT sensors can produce vast amounts of data that needs to be efficiently stored and queried to detect anomalies and predict future failures, for instance.
Trading or blockchain prices. Millions of transactions in the markets produce gigantic amounts of data to be efficiently stored, processed, and analyzed to extract insights and predict price movements.
Monitoring systems. Systems like Prometheus, which were designed specifically to collect, store, and analyze log data.

How to choose the right database for time-series data

There are several features you must consider while selecting the right time-scale DB.

Cost of usage

Commonly, time-series data would grow, so you have to maintain gigabytes or terabytes of data. Well, be careful while estimating. It happens that your Proof of Concept was smooth and easy. But you have to calculate the approximate production-like workload properly and the data volumes you’ll handle and then try to assume the hardware needed to maintain it.

Complexity of migration

Sometimes, time-series projects count on relational databases, so if you have one, it seems like a sweet spot. But the easiness of migration may play a bad game with you in the future. It’s better to suffer at the beginning while migrating and get many benefits when having a proper solution. Also, be aware of vendor lock-in. Once you opt to go with some cloud provider and choose their solution, you’re now trapped, and migrating out of this cloud would be hard. However, if you know you’ll be using the same cloud provider forever, then no worries.

Effectiveness for use cases

It’s a common mistake to test your solution on a small data set or low workload, spend a lot of time migrating it, and come across a situation where your solution was perfect at first glance but couldn’t keep up with the production due to its workload. You’d better try to keep your DB as close to your product as possible.

What is the most popular time-series database?

Nowadays, there is a plethora of solutions with support for the best high-performance time series databases. I’ll reference some of them here.

Nowadays, there is a plenty of solutions with time-series support. I’ll reference some of them here.

TimescaleDB—a Postgres-based open-source database with a cloud version. It offers a seamless transition from any Postgres-based application. You can maintain hybrid databases, i.e. when only part of the tables gets optimized for time-series analysis. In recent releases, they also claim support for clustering, but we had no chance to check it out.
ClickHouse—an open-source database optimized for fast ingestion and processing of time-series data. It’s SQL-based, but keep in mind that there are some missing features compared to regular relational databases. Overall, it’s a solid approach with fastest ingestion (from what I saw) and some unusual solutions for common problems.
InfluxDB—an open-source database with a cloud version. It’s a ground-up solution for timeseries data. It’s not a relational database, so it can’t be a drop-in replacement for your regular DB.
AWS Timestream—well, it’s AWS timeseries DB, with SQL support. We’ve never compared it to other solutions, so I leave it without insights.
SingleStore—it offers a cloud version. Formerly MemSQL. A solid column-oriented database with strong support for time-series data. It’s scalable by design, so it can maintain a good rate of inserts and decent query performance for analysis.

What makes a time-series database the best?

Selecting the best database for time series data depends on a range of factors that cater to specific needs and use cases. While different businesses might prioritize certain features over others, there are several key aspects to consider when determining the best time-series database for your requirements.

Time-series database features

High Performance

A high-performance time series database is crucial for processing and analyzing vast amounts of data quickly. Look for databases that offer optimized query processing and data retrieval, especially when dealing with frequent aggregations and complex calculations.

Scalability

As data volumes continue to grow, a time-series database should be able to scale horizontally to accommodate increasing data loads. Elastic scalability ensures that your database can handle spikes in data without compromising performance.

Data Retention

Effective data retention policies are essential for managing the storage of time series data over extended periods. The database should provide options for automatic data expiration based on predefined criteria to ensure optimal data storage and retrieval.

Ease of Migration

While migrating from a traditional relational database to a time-series database might seem challenging, choosing a solution that offers a smooth migration process can save significant time and effort in the long run. Avoid vendor lock-in by selecting a database that supports standardized data formats and APIs.

Optimized Storage

The best database for time series employs storage optimization techniques such as compression, sharding, and partitioning to efficiently store data while minimizing storage costs.

Granularity and Aggregation

Time-series data analysis often involves aggregating data at various levels of granularity. Look for databases that support flexible aggregation capabilities to analyze data at different time intervals.

Custom Functions

Some high-performance time series databases offer custom functions that go beyond standard SQL capabilities, enabling advanced analytical tasks specific to time series data. These functions can enhance the depth and accuracy of your analysis.

Support for Anomalies and Predictions

The ability to identify anomalies and predict future trends is vital in time series analysis. A strong time series database should facilitate the implementation of machine learning algorithms and predictive models.

Integration with Analytics Tools

Consider databases that integrate well with popular analytics and visualization tools. This integration simplifies the process of extracting insights and creating visual representations of your time series data.

How to get started with a minimum amount of time and effort

Ready to choose the best time series database for your needs?

Now let’s assume you’re ready to get started implementing your solution to collect the time-series data, you have a great idea, and it’s the right time to start. But something is missing. It’s Broscorp’s team that’s a key element for your success. Why?

We’ve got vast experience designing and implementing high-load storage and database solutions.
We conduct time-series database comparisons to help you choose the best database for time-series data in your project to effectively use time & money.
We’re a super business-oriented company. Our goal is to build a system that brings profit to your company, not to develop some solution.

At Broscrop, we understand the importance of selecting the right time-series database for your business. With our expertise in designing and implementing high-load storage and database solutions, we can help you navigate the options and make an informed decision.