Effective Management of Time Series Data

Date

2022-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We live in a society intrinsically dependent on data for most daily tasks, and the advancement of technology reinforces this phenomenon with the emergence of big data. The increasing deployment of sensors provides an immense increase of data generated with many being of time series nature. This is especially true for High-Performance Computing (HPC) systems, which are powerful machines consisting of thousands of nodes with several components that require constant and thorough monitoring. The management of large-scale time series databases then becomes increasingly challenging due to their astonishing collection rate of metrics. Age threshold retention policies are then implemented to delete the historical data and reduce database volume, but that gets rid of all valuable information from antique periods. Alternatively, we can apply time series deduplication with metric-based tolerance to streaming or historical intervals to discard readings that stabilize within the calculated tolerance window; thus, keeping only the important changes to the system. We can reduce data volume from a given interval by 35%-99% based on the selected dataset and tolerance formula. Once the data reduced interval is queried, the readings can be reconstructed to retrieve the original granularity with Mean Absolute Percentage Errors (MAPE) of 0.42%-1.26% for different datasets with low query runtime overhead. We may also create a processing pipeline with deduplication, aggregation, compression, and reconstruction to effectively manage time series data.

Description

Keywords

Data duplication, Time series management, Time series data, Data reconstruction, Time series big data

Citation