Additional material on the Complexity Measures and Features for Times Series classification (CMFTS)
This is a specific material for the article “Complexity Measures and Features for Times Series classification”, including datasets, theoretical information on the measures used, results, etc.
Abstratc -
Keywords: Time series features, Complexity measures, Time series interpretation, Classification
This section includes the theoretical definitions of the measures proposed in this paper.
The Kolmogorov complexity K(x) of an object x is the length, in bits, of the smallest program (in bits) that when run on a Universal Turing Machine (U) outputs K(x) and then stops with the execution [14]. This measure was independently developed by Andrey N. Kolmogorov in the late 1960s. A good introduction to the Kolmogorov complexity (in further text KL) can be found in [5] and with a comprehensive description in [13].
On the basis of Kolmogorov’s idea, Lempel and Ziv [10] developed an algorithm (LZA), which is often used in assessing the randomness of finite sequences as a measure of its disorder.
The Kolmogorov complexity K(x) of an object x is the length, in bits, of the smallest program (in bits) that when run on a Universal Turing Machine (U) outputs K(x) and then stops with the execution [12]. This measure was independently developed by Andrey N. Kolmogorov in the late 1960s. A good introduction to the Kolmogorov complexity (in further text KL) can be found in [5, p.463–508] and with a comprehensive description in [11].
The Kolmogorov complexity of a time series {}, =1,2,3,4,…, by this algorithm (LZA) can be summarized as follows:
[Step A]: Encode the time series by constructing a sequence S consisting of the characters 0 and 1 written as {}, =1,2,3,4,…,, according to the rule
Here , is a threshold that should be properly chosen. The mean value of the time series has often been used as the threshold [21]. Depending on the application, other encoding schemes are also available [14].
[Step B]: Calculate the complexity counter , which is defined as the minimum number of distinct patterns contained in a given character sequence [7]; is a function of the length of the sequence N . The value of is approaching an ultimate value as approaching infinite, i.e.
[Step C]: Calculate the normalized complexity measure , which is defined as
The is a parameter to represent the information quantity contained in a time series, and it is to be a 0 for a periodic or regular time series and to be a 1 for a random time series, if N is large enough. For a non-linear time series, is to be between 0 and 1. They note that Hu et al. [9] derived analytic expression for in the Kolmogorov complexity, for regular and random sequences. In addition they showed that for the shorter length of the time series, the larger value and correspondingly the complexity for a random sequence can be considerably larger than 1.
The approximate entropy (ApEn) is a form quantify the concept of changing complexity [13], so that it is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. ApEn can be computed for any time series, chaotic or otherwise. The functioning of this measure is as follows:
Given a time-series of data u(1), u(2),…, u(N), from measurements equally spaced in time.
Fix , a positive integer, and , a positive real number. The value of represents the length of compared run of data, and specifies a filtering level.
Form a sequence of vectors x(1), x(2), . . ., x(N - m + 1) in , defined by
x(i) =[u(i), u(i + 1), . . ., u(i + m -1)].
Define for each i, ,
We must define for vectors and . They follow Takens [18] by defining
Define
Finally
where is the natural logarithm, for and .
Greater utility for ApEn arises when the means and standard deviation of evolving systems show little change with system evolution. Different r values must be chosen, for different systems, to provide the ApEn statistics a good likelihood of distinguishing versions of each system from one another.
SampEn statistics have reduced bias [15]. They developed SampEn statistics to be free of the bias caused by self-matching. The name refers to the applicability to time series data sampled from a continuous process. In addition, the algorithm suggests ways to employ sample statistics to evaluate the results, as explained below.
There are two major differences between SampEn and ApEn statistics. First, SampEn does not count self-matches. They justified discounting self-matches on the grounds that entropy is conceived as a measure of the rate of information production , and in this context comparing data with themselves is meaningless. Furthermore, self-matches are explicitly
dismissed in the later work of Grassberger and co-workers. Second, SampEn does not use a template-wise approach when estimating conditional probabilities. To be defined, SampEn requires only that one template find a match of length .
They defined as times the number of vectors within of , where ranges from 1 to , and to exclude self-matches. They then defined . Similarly, they defined as times the number of vectors within of , where ranges from 1 to , and set . is then the probability that two sequences will match for points, whereas is the probability that two sequences will match for points. We then defined the parameter , which is estimated by the statistic . Where there is no confusion about the parameter and the length of the template vector, we set and , so that B is the total number of template matches of length and A is the total number of forward matches of length . They note that , so can be expressed as .
A complexity measures which are easily calculated for any type of time series, be it regular, chaotic, noisy or from reality [2].
Consider a time series {} . They study all permutations π of order n which are considered here as possible order types of different numbers. For each they determine the relative frequency
This estimates the frequency of as good as possible for a finite series of values. To determine exactly, they have to assume an infinite time series , , … and take the limit for in the above formula. This limit exists with probability 1 when the underlying stochastic process fulfils a very weak stationarity condition: for , the probability for should not depend on . The permutation entropy of order is defined as
where the sum runs over all permutations of order . This is the information contained in comparing consecutive values of the time series. It is clear that where the lower bound is attained for an increasing or decreasing sequence of values, and the upper bound for a completely random system (i.i.d. sequence) where all possible permutations appear with the same probability. The time series presents some sort of dynamics when . In their experiments with chaotic time series, did increase at most linearly with . So it seems useful to define the permutation entropy per symbol of order , dividing by since comparisons start with the second value:
The Chao-Shen entropy estimator (2003) is a Horvitz-Thompson (1952) estimator applied to the problem of entropy estimation, with additional coverage correction as proposed by Good (1953). Note that the Chao-Shen estimator is not a plug-in estimator, hence there are no
explicit underlying bin frequencies.
Assume that there are S species in a community and they are labeled from 1 to S. Denote the probabilities of species discovery (or relative abundance) by where .
The CV is define as , where is the species abundance and . The value of CV characterizes the degree of heterogeneity [3] among the species abundances. The CV is zero if and only if the species have equal abundance. The larger the CV, the greater the degree of heterogeneity.
We consider one-sided infinite sequences where . In most examples we shall deal with , but everything holds also for with minor modifications [16]. We assume that these are realizations of a stochastic process with probabilities
They measure the average amount of information contained in a word of length . The differential entropies, give the new information of the n-th symbol if the preceding (n-1) symbols are known. Here,, is the conditional probability for being , conditioned on the previous symbols . The Shannon entropy is It measures the average amount of information per symbol if all correlations and constraints are taken into account. This limit is approached monotonically from above, i.e. all are upper bounds on .
For the numerical estimation of from a finite sequence of length one usually estimates all word probabilities , up to some fixed by the standard likelihood estimate, where is the number of occurrences of the word (Strictly, the denominator should be , but this difference is in general negligible). From these one computes estimates by inserting them into the second eq. Finally, an estimator is obtained either by computing and extrapolating, or simply as . The latter is less influenced by statistical errors but shows slower convergence.
PSE based-on Shannon information theory is an index of complexity measure for an uncertain system. It has a good metrical effect for the change of nonlinear dynamic states, and that it just needs a little data [20]. PSE is defined as follows:
Calculate the spectrum of the signal.
Calculate the Power Spectral Density of the signal via squaring its amplitude and normalizing by the number of bins.
Normalize the calculated PSD so that it can be viewed as a Probability Density Function (integral is equal to 1).
The Power Spectral entropy can be now calculated using a standard formula for an entropy calculation.
Number of forbiden patterns [1]. When analyzing a time series of length N, we obtain N−d+ 1 overlapping groups of adjacent values, each one with a corresponding order pattern [19]. If the series has a random behavior, any permutation can appear, and therefore no pattern is forbidden. Moreover, their probability distribution should be flat since any permutation has the same probability of occurrence when the dataset is long enough to exclude statistical fluctuations. Nevertheless, when the series corresponds to a chaotic variable, there are some patterns that cannot be encountered in the data due to the underlying deterministic structure: they are the so-called forbidden patterns. It has been demonstrated that most chaotic systems exhibit forbidden patterns, and that in many cases (ergodic finite-alphabet information sources) the measure of the number of these patterns is related to other classic metric entropy rates (e.g., the Lyapunov exponent).
For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness [6].
Skewness characterizes the degree of asymmetry of a given distribution [4] around its mean. If the distribution of the data are symmetric then skewness will be close to 0. Positive skewness indicates a distribution with an asymmetric tail extending toward more
positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values.
[1] J. Amigó. Permutation complexity in dynamical systems: ordinal patterns, permutation entropy and all that. Springer Science & Business Media, 2010.
[2] C. Bandt and B. Pompe. Permutation entropy: a natural complexity measure for time series. Physical review letters, 88(17):174102, 2002.
[3] A. Chao and T.-J. Shen. Nonparametric estimation of shannons index of diversity when there are unseen species in sample. Environmental and ecological statistics, 10(4):429–443, 2003.
[4] P. Čisar and S. M. Čisar. Skewness and kurtosis in function of selection of network traffic distribution. Acta Polytechnica Hungarica, 7(2):95–106, 2010.
[5] T. M. Cover and J. A. Thomas. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, 2006.
[6] L. T. DeCarlo. On the meaning and use of kurtosis. Psychological methods, 2(3):292, 1997.
[7] R. Ferenets, T. Lipping, A. Anier, V. Jäntti, S. Melto, and S. Hovilehto. Comparison of entropy and complexity measures for the assessment of depth of sedation. Biomedical Engineering, IEEE Transactions on, 53(6):1067–1077, 2006.
[8] J. Hausser and K. Strimmer. R-package entropy, 2014.
[9] J. Hu, J. Gao, and J. C. Principe. Analysis of biomedical signals by the lempel-ziv complexity: the effect of finite data size. Biomedical Engineering, IEEE Transactions on, 53(12):2606–2609, 2006.
[10] A. Lempel and J. Ziv. On the complexity of finite sequences. Information Theory, IEEE Transactions on, 22(1):75–81, 1976.
[11] M. Li and P. Vitanyi. An introduction to kolmogorov complexity and its applications. 1997.
[12] D. T. Mihailović, G. Mimić, N. Drešković, and I. Arsenić. Kolmogorov complexity based information measures applied to the analysis of different river flow regimes. Entropy, 17(5):2973–2987, 2015.
[13] S. M. Pincus. Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences, 88(6):2297–2301, 1991.
[14] N. Radhakrishnan, J. D. Wilson, and P. C. Loizou. An alternate partitioning technique to quantify the regularity of complex time series. International Journal of Bifurcation and Chaos, 10(07):1773–1779, 2000.
[15] J. S. Richman and J. R. Moorman. Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6):H2039–H2049, 2000.
[16] T. Schürmann and P. Grassberger. Entropy estimation of symbol sequences. Chaos: An Interdisciplinary Journal of Nonlinear Science, 6(3):414–427, 1996.
[18] F. Takens. Atas do 13. col brasiliero de matematicas. Rio de Janerio, Brazil, 1983.
[19] M. Zanin. Forbidden patterns in financial time series. Chaos: An Interdisciplinary Journal of Nonlinear Science, 18(1):013119, 2008.
[20] A. Zhang, B. Yang, and L. Huang. Feature extraction of eeg signals using power spectral entropy. In 2008 International Conference on BioMedical Engineering and Informatics, volume 2, pages 435–439, May 2008.
[21] X.-S. Zhang, R. J. Roy, and E. W. Jensen. Eeg complexity as a measure of depth of anesthesia for patients. Biomedical Engineering, IEEE Transactions on, 48(12):1424–1433, 2001.
This section includes the software developed in this work.
CMFTS:
https://github.com/fjbaldan/CMFTS
(The source code will be available when the work is published)
We have used the complete repository UCR Time Series Classification Archive, 128 time series classification datasets:
https://www.cs.ucr.edu/~eamonn/time_series_data_2018
.csv file containing all the results of the experimentation carried out.
https://github.com/fjbaldan/CMFTS/results/results.csv
List of authors and affiliations:
Francisco J. Baldán (fjbaldan@decsai.ugr.es), University of Granada.
José M. Benítez (J.M.Benítez@decsai.ugr.es), University of Granada.