We start with other classifcation problem. GunPoint dataset comes from the video surveillance domain. Description by Chotirat Ann Ratanamahatana and Eamonn Keogh in: “Everything you know about Dynamic Time Warping is Wrong“ is as follows:

“…The two classes are:

• Gun-Draw: The actors have their hands by their sides. They draw a replicate gun from a hip-mounted holster, point it at a target for approximately one second, then return the gun to the holster, and their hands to their sides.
• Point: The actors have their gun by their sides. They point with their index fingers to a target for approximately one second, and then return their hands to their sides.

For both classes, we tracked the centroid of the actor’s right hands in both X- and Y-axes, which appear to be highly correlated; therefore, in this experiment, we only consider the X-axis for simplicity…“. The following figure shows a video sequence of GunPoin problem taken from “Everything you know about Dynamic Time Warping is Wrong“.

GunPoint dataset is available in the LPStimeSeries R package. We install and load the dataset in the following code:

install.packages("LPStimeSeries")library("LPStimeSeries")data(GunPoint)

GunPoint is a list with a training dataset and a testing dataset provided as separate matrix. At first, we obtain a single dataset by the union of testing and training partitions. Now, we perform a partion that simulate the semi-supervised context.

library("ssc")x <- rbind(GunPoint$trainseries,GunPoint$testseries) # instancesy <- c(GunPoint$trainclass,GunPoint$testclass)       # classesset.seed(1) # set seedtra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5))xtrain <- x[tra.idx,] # training instancesytrain <- y[tra.idx]  # related classestra.na.idx <- sample(x = length(tra.idx),                     size = ceiling(length(tra.idx) * 0.7))ytrain[tra.na.idx] <- NA # remove classes from 70% of instancesxttest <- x[tra.na.idx,] # ejemplos no etiquetados de entrenamientoyttest <- y[tra.na.idx]  # clases reales tst.idx <- setdiff(1:length(y), tra.idx)xitest <- x[tst.idx,] # ejemplos de pruebayitest <- y[tst.idx]  # classes asociadas

Now we compute the distance matrices needed to perform the training phase:

library(proxy) # cargar paquetedtrain <- as.matrix(dist(x = xtrain, method = "euclidean",                         by_rows = TRUE))ditest <- as.matrix(dist(x = xitest, y = xtrain,                         method = "euclidean", by_rows = TRUE))dttest <- as.matrix(dist(x = xttest, y = xtrain,                         method = "euclidean", by_rows = TRUE))

Using the distance matrices we train five semi-supervised model available in the ssc package.

m.selft <- selfTraining(x = dtrain, y = ytrain)m.setred <- setred(x = dtrain, y = ytrain)m.snnrce <- snnrce(x = dtrain, y = ytrain)m.trit <- triTraining(x = dtrain, y = ytrain)m.cobc <- coBC(x = dtrain, y = ytrain)

To determine the most accurate classifier, we need to perform a comparison between the classification results obtained with each semi-supervised model. All statistics are included in a matrix and plotted in a barplot for easy comparison. We use the statistics function available in the ssc package.

matrix.stat <- matrix(nrow = 3, ncol = 5)d <- ditest[, m.selft$included.insts]p.selft <- predict(m.selft, d) # clasificar con selfTrainingmatrix.stat[,1] <- unlist(statistics(p.selft, yitest))d <- ditest[, m.setred$included.insts]p.setred <- predict(m.setred, d) # clasificar con setredmatrix.stat[,2] <- unlist(statistics(p.setred, yitest))d <- ditest[, m.snnrce$included.insts]p.snnrce <- predict(m.snnrce, d) # clasificar con snnrcematrix.stat[,3] <- unlist(statistics(p.snnrce, yitest))d <- ditest[, m.trit$included.insts]p.trit <- predict(m.trit, d) # clasificar con triTrainingmatrix.stat[,4] <- unlist(statistics(p.trit, yitest))d <- ditest[, m.cobc\$included.insts]p.cobc <- predict(m.cobc, d) # clasificar con coBCmatrix.stat[,5] <- unlist(statistics(p.cobc, yitest))barplot(matrix.stat, beside = T,         names.arg = c("SelfT","SETRED","SNNRCE","TriT","coBC"),         ylim = c(0.6,1),         density = c(10,30,40), angle = c(45, 135, 60),         col=c("cadetblue2","cadetblue3","cadetblue"),        ylab = "Estadísticos de la clasificación", xpd = F,        main="Classification of GunPoint dataset",         legend = c("kappa","Accuracy","F-Measure"),         args.legend = list(x =  "top", ncol = 3))

In the specific case of triTraining and coBC functions, it is possible to obtaing different results at different runs of the training functions. This is caused by the randomness involved in the bagging process performed at the initial stage of the training process.