We start with other classifcation problem. GunPoint dataset comes from the video surveillance domain. Description by Chotirat Ann Ratanamahatana and Eamonn Keogh in: “Everything you know about Dynamic Time Warping is Wrong“ is as follows:

“…The two classes are:

  • Gun-Draw: The actors have their hands by their sides. They draw a replicate gun from a hip-mounted holster, point it at a target for approximately one second, then return the gun to the holster, and their hands to their sides.
  • Point: The actors have their gun by their sides. They point with their index fingers to a target for approximately one second, and then return their hands to their sides.

For both classes, we tracked the centroid of the actor’s right hands in both X- and Y-axes, which appear to be highly correlated; therefore, in this experiment, we only consider the X-axis for simplicity…“. The following figure shows a video sequence of GunPoin problem taken from “Everything you know about Dynamic Time Warping is Wrong“.

GunPoint dataset is available in the LPStimeSeries R package. We install and load the dataset in the following code:

install.packages("LPStimeSeries")
library("LPStimeSeries")
data(GunPoint)

GunPoint is a list with a training dataset and a testing dataset provided as separate matrix. At first, we obtain a single dataset by the union of testing and training partitions. Now, we perform a partion that simulate the semi-supervised context.

library("ssc")
x <- rbind(GunPoint$trainseries,GunPoint$testseries) # instances
y <- c(GunPoint$trainclass,GunPoint$testclass)       # classes

set.seed(1) # set seed

tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5))
xtrain <- x[tra.idx,] # training instances
ytrain <- y[tra.idx]  # related classes

tra.na.idx <- sample(x = length(tra.idx),
                     size = ceiling(length(tra.idx) * 0.7))
ytrain[tra.na.idx] <- NA # remove classes from 70% of instances
xttest <- x[tra.na.idx,] # ejemplos no etiquetados de entrenamiento
yttest <- y[tra.na.idx]  # clases reales
 
tst.idx <- setdiff(1:length(y), tra.idx)
xitest <- x[tst.idx,] # ejemplos de prueba
yitest <- y[tst.idx]  # classes asociadas

Now we compute the distance matrices needed to perform the training phase:

library(proxy) # cargar paquete
dtrain <- as.matrix(dist(x = xtrain, method = "euclidean",
                         by_rows = TRUE))
ditest <- as.matrix(dist(x = xitest, y = xtrain,
                         method = "euclidean", by_rows = TRUE))
dttest <- as.matrix(dist(x = xttest, y = xtrain,
                         method = "euclidean", by_rows = TRUE))

Using the distance matrices we train five semi-supervised model available in the ssc package.

m.selft <- selfTraining(x = dtrain, y = ytrain)
m.setred <- setred(x = dtrain, y = ytrain)
m.snnrce <- snnrce(x = dtrain, y = ytrain)
m.trit <- triTraining(x = dtrain, y = ytrain)
m.cobc <- coBC(x = dtrain, y = ytrain)

To determine the most accurate classifier, we need to perform a comparison between the classification results obtained with each semi-supervised model. All statistics are included in a matrix and plotted in a barplot for easy comparison. We use the statistics function available in the ssc package.

matrix.stat <- matrix(nrow = 3, ncol = 5)

d <- ditest[, m.selft$included.insts]
p.selft <- predict(m.selft, d) # clasificar con selfTraining
matrix.stat[,1] <- unlist(statistics(p.selft, yitest))

d <- ditest[, m.setred$included.insts]
p.setred <- predict(m.setred, d) # clasificar con setred
matrix.stat[,2] <- unlist(statistics(p.setred, yitest))

d <- ditest[, m.snnrce$included.insts]
p.snnrce <- predict(m.snnrce, d) # clasificar con snnrce
matrix.stat[,3] <- unlist(statistics(p.snnrce, yitest))

d <- ditest[, m.trit$included.insts]
p.trit <- predict(m.trit, d) # clasificar con triTraining
matrix.stat[,4] <- unlist(statistics(p.trit, yitest))

d <- ditest[, m.cobc$included.insts]
p.cobc <- predict(m.cobc, d) # clasificar con coBC
matrix.stat[,5] <- unlist(statistics(p.cobc, yitest))

barplot(matrix.stat, beside = T,
        names.arg = c("SelfT","SETRED","SNNRCE","TriT","coBC"),
        ylim = c(0.6,1),
        density = c(10,30,40), angle = c(45, 135, 60),
        col=c("cadetblue2","cadetblue3","cadetblue"),
        ylab = "Estadísticos de la clasificación", xpd = F,
        main="Classification of GunPoint dataset",
        legend = c("kappa","Accuracy","F-Measure"),
        args.legend = list(x =  "top", ncol = 3))

 

In the specific case of triTraining and coBC functions, it is possible to obtaing different results at different runs of the training functions. This is caused by the randomness involved in the bagging process performed at the initial stage of the training process.