This Website contains complementary material to the manuscript: Christoph Bergmeir and Jose M. Benitez. On the Use of Cross-validation for Time Series Predictor Evaluation

This Website contains complementary material to the manuscript:
Christoph Bergmeir and José M. Benítez. On the Use of Cross-validation for Time Series Predictor Evaluation

Abstract

In time series predictor evaluation, we observe that with respect to the model selection procedure there is a gap between evaluation of traditional forecasting procedures, on the one hand, and evaluation of machine learning techniques on the other hand. In traditional forecasting, it is common practice to reserve a part from the end of each time series for testing, and to use the rest of the series for training. Thus it is not made full use of the data, but theoretical problems with respect to temporal evolutionary effects and dependencies within the data as well as practical problems regarding missing values are eliminated. On the other hand, when evaluating machine learning and other regression methods used for time series forecasting, often cross-validation is used for evaluation, paying little attention to the fact that those theoretical problems invalidate the fundamental assumptions of cross-validation. To close this gap and examine the consequences of different model selection procedures in practice, we have developed a rigorous and extensive empirical study. Six different model selection procedures, based on (i) cross-validation and (ii) evaluation using the series’ last part, are used to assess the performance of four machine learning and other regression techniques on synthetic and real-world time series. No practical consequences of the theoretical flaws were found during our study, but the use of cross-validation techniques led to a more robust model selection. To make use of the “best of both worlds”, we suggest that the use of a blocked form of cross-validation for time series evaluation became the standard procedure, thus using all available information and circumventing the theoretical problems.
_______________________________________________________________________________________________________


		CV	bCV	lB	CV-lB	CV-bCV	bCV-lB

MSE	AS1	0.023	0.030	0.011	0.012	-0.007	0.019
	AS2	0.042	0.055	0.035	0.007	-0.013	0.020
	AS3	0.122	0.084	0.123	-0.001	0.038	-0.039

RMSE	AS1	0.012	0.015	0.006	0.006	-0.003	0.010
	AS2	0.022	0.027	0.017	0.005	-0.005	0.010
	AS3	0.061	0.046	0.060	0.002	0.015	-0.013

SSE	AS1	0.284	0.291	0.274	0.010	-0.007	0.017
	AS2	0.249	0.255	0.237	0.012	-0.006	0.018
	AS3	0.410	0.380	0.459	-0.048	0.030	-0.079

MAE	AS1	0.011	0.014	0.012	-0.001	-0.003	0.002
	AS2	0.028	0.033	0.034	-0.006	-0.005	-0.001
	AS3	0.065	0.043	0.045	0.020	0.022	-0.002

MDAE	AS1	-0.023	-0.013	0.015	0.008	0.010	-0.002
	AS2	0.053	0.052	0.085	-0.032	0.002	-0.033
	AS3	0.055	0.048	0.044	0.011	0.007	0.004

MAPE	AS1	-0.044	-0.055	0.082	-0.039	-0.011	-0.027
	AS2	-0.245	-0.268	-0.125	0.121	-0.023	0.144
	AS3	–	–	–	–	–	–

MDAPE	AS1	-0.012	-0.002	0.020	-0.007	0.011	-0.018
	AS2	-0.071	-0.070	-0.039	0.033	0.001	0.031
	AS3	-0.028	-0.033	-0.012	0.017	-0.005	0.022

SMAPE	AS1	-0.309	-0.346	-0.279	0.030	-0.037	0.066
	AS2	-0.292	-0.215	-0.719	-0.427	0.077	-0.504
	AS3	-0.755	-1.578	-0.013	0.741	-0.823	1.564

SMDAPE	AS1	-0.013	-0.005	0.048	-0.035	0.008	-0.044
	AS2	0.407	0.705	0.126	0.281	-0.298	0.579
	AS3	-0.723	-0.431	-0.003	0.720	0.292	0.428

MRAE	AS1	-0.326	-0.326	-0.209	0.117	0.000	0.117
	AS2	-0.289	-0.288	-0.094	0.195	0.000	0.194
	AS3	–	–	–	–	–	–

MDRAE	AS1	-0.060	-0.060	-0.059	0.001	0.000	0.001
	AS2	-0.074	-0.081	-0.059	0.014	-0.007	0.021
	AS3	0.065	0.041	0.060	0.005	0.025	-0.020

GMRAE	AS1	-0.089	-0.085	-0.062	0.027	0.004	0.023
	AS2	-0.073	-0.071	-0.059	0.015	0.002	0.013
	AS3	–	–	–	–	–	–

RELMAE	AS1	-0.067	-0.062	-0.069	-0.002	0.005	-0.007
	AS2	-0.072	-0.075	-0.098	-0.026	-0.003	-0.023
	AS3	0.022	0.003	0.005	0.017	0.019	-0.002

RELMSE	AS1	-0.096	-0.090	-0.118	-0.022	0.006	-0.028
	AS2	-0.137	-0.160	-0.181	-0.044	-0.023	-0.021
	AS3	-0.006	-0.013	0.009	-0.004	-0.007	0.003

Table 4:

Medians and differences in the median. The columns are: CV, bCV, and lB: Median of (E_out-set∕E_in-set) values for the procedures CV, blockedCV, and lastBlock, diminished by one. The optimal ratio of the errors is one (which would result in a zero in the table), as then the in-set error equals the out-set error, and hence is a good estimate. Negative values in the table indicate a greater in-set error, i.e., the out-set error is overestimated. A positive value, on the contrary, indicates underestimation. CV-lB, CV-bCV, and bCV-lB: differences of the absolute values of CV, bCV, and lB. A negative value indicates that the minuend in the difference leads to a value nearer to one, that is, to a better estimate of the error.

This Website contains complementary material to the manuscript:
Christoph Bergmeir and José M. Benítez. On the Use of Cross-validation for Time Series Predictor Evaluation

Contents

1. Design of the Experiments

1.1. Applied Models and Algorithms

1.2. Benchmarking Data

1.3. Data Preparation and Partitioning

1.4. Compared Model Selection Procedures

2. Statistical Evaluation

3. Plots of the Results

3.1. Plots for scenario (AS1)

3.1.1. (AS1) Box Plots

3.1.2. (AS1) Point Plots Combined

3.1.3. (AS1) Point Plots per Error Measure

3.2. Plots for scenario (AS2)

3.2.1. (AS2) Box Plots

3.2.2. (AS2) Point Plots Combined

3.2.3. (AS2) Point Plots per Error Measure

3.3. Plots for scenario (AS3)

3.3.1. (AS3) Box Plots

3.3.2. (AS3) Point Plots Combined

3.3.3. (AS3) Point Plots per Error Measure

This Website contains complementary material to the manuscript: Christoph Bergmeir and José M. Benítez. On the Use of Cross-validation for Time Series Predictor Evaluation

Contents

1. Design of the Experiments

1.1. Applied Models and Algorithms

1.2. Benchmarking Data

1.3. Data Preparation and Partitioning

1.4. Compared Model Selection Procedures

2. Statistical Evaluation

3. Plots of the Results

3.1. Plots for scenario (AS1)

3.1.1. (AS1) Box Plots

3.1.2. (AS1) Point Plots Combined

3.1.3. (AS1) Point Plots per Error Measure

3.2. Plots for scenario (AS2)

3.2.1. (AS2) Box Plots

3.2.2. (AS2) Point Plots Combined

3.2.3. (AS2) Point Plots per Error Measure

3.3. Plots for scenario (AS3)

3.3.1. (AS3) Box Plots

3.3.2. (AS3) Point Plots Combined

3.3.3. (AS3) Point Plots per Error Measure

This Website contains complementary material to the manuscript:
Christoph Bergmeir and José M. Benítez. On the Use of Cross-validation for Time Series Predictor Evaluation