This example is dedicated to ilustrate the use of democratic training function. This model has differences from the other models in the ssc package. Democratic model is a multi-learning approach that needs different classifier specifications to perform the learning. We simulate the multi-learning context training three 1NN classifiers with different dissimilarity measures.

In the next code, we assume the existence of the variables xtrain, ytrain, xttrain, yttrain, xitrain and yitrain, created in Example 3. We specify in variable dist a list of functions that will be used to compute the distance between the training examples (The distances EDR and DTW comes from the temporal domain). The matrix dist.use relates each classifier with an specific distance function. Finally, we call democratic function to train this semi-supervised model.

library("ssc")
bclassifs <- list(bClassifOneNN(), bClassifOneNN(),
                  bClassifOneNN()) #creating three classifiers
dist <- list("Euclidean",
             function(x, y){dtw::dtw(x, y)$distance},
             function(x, y){TSdist::EDRDistance(x, y,
                                                epsilon = 0.2)})
dist.use <- matrix(
  data = c(
    TRUE, FALSE, FALSE,
    FALSE, TRUE, FALSE,
    FALSE, FALSE, TRUE
  ),
  nrow = 3, byrow = TRUE
)
m.demo <- democratic(x = xtrain, y = ytrain,
                     bclassifs, dist, dist.use)

We perform inductive classification in a similar way as was done with the other semi-supervised methods.

p.demo <- predict(m.demo, xitest) # clasificar con democratic
r.demo <- unlist(statistics(p.demo, yitest))

The semi-supervised learning try to obtain advantage of using the unlabeled instances during the training process. We estimate the gain that represent the use of unlabeled instances through a comparison with a semi-supervised classifier. We train a supervised 1NN using only the initial labeled instances provided during the training process. Next we evaluate this classifier in the testing instances and compare this results with respect to semi-supervised inductive results.

library("proxy")
labeled.idx <- which(!is.na(ytrain))  #indexes of initial labeled instances
xilabeled <- xtrain[labeled.idx,]     #initial labeled instances
yilabeled <- ytrain[labeled.idx]      #related classes

CIR <- oneNN(x = NULL, yilabeled)     #create 1NN
pdist <- as.matrix(dist(x = xitest, y = xilabeled,
                        method = "euclidean", by_rows = TRUE))
CIRclass <- predict(CIR, pdist)       #classify with 1NN
r.CIR <- unlist(statistics(predicted = CIRclass, real = yitest))

We compare both results to contrast the supervised and semi-supervised paradigms.

barplot(rbind(r.CIR,r.demo), beside = T, 
        names.arg = c("Kappa","Accuracy","F-Measure"),
        ylim = c(0.6,1),
        density = c(10,40), angle = c(45, 135),
        col=c("cadetblue2","cadetblue"),
        ylab = "Estadísticos de la clasificación", xpd = F,
        main="Supervised vs Semi-Supervised Classification",
        legend = c("1NN","Democratic"),
        args.legend = list(x =  "top", ncol = 2))