1 The Classification Problem

In this section, we want to demonstrate how to use frbs package for classification problems. Iris dataset which is a popular dataset for benchmark on classification problems will be simulated.

Generally, there are four steps for simulating to obtain the results. Firstly, the preprocessing step is to prepare the data as the input data of the method. Then, we generate the model or object. Thirdly, the predicting step is conducted using new data. Finally, we could print the results and calculate the error between real dan predicted values.

Futhermore, beside we depict how to use frbs package step by step, we are comparing them with classification tree steps in order to help users who have been familiar with classification tree package.

1.1 Preprocessing Data

In this step, we split data into two parts: training and testing data. Training data is used to generate fuzzy rule-based systems (FRBS) model while testing data is used in predicting phase to obtain predicted values. It should be note that we must convert the categorical into numerical data. For example in this case, we are using iris dataset which has values: "setosa", "virginia", and "versicolor" for the Species attribute. These values must be converted into numerical values, e.g. 1, 2, and 3. Furthermore, the output variables which will be predicted should be placed on the last column in training data.

Range of data could be defined or omitted. If it is omitted, frbswill calculate minimum and maximum of training data as the range. But, we recommend this parameter should be defined to avoid some data out of range, especially when performing predicting phase. It should be taken into account that for the case of classification tasks, range of data is ONLY for input variables.

  R> data(iris)
  R> set.seed(2)
  R> irisShuffled <- iris[sample(nrow(iris)),]
  R> irisShuffled[,5] <- unclass(irisShuffled[,5])
  R> tra.iris <- irisShuffled[1:105,]
  R> tst.iris <- irisShuffled[106:nrow(irisShuffled),1:4]
  R> real.iris <- matrix(irisShuffled[106:nrow(irisShuffled),5], ncol = 1)
  R> range.data.input <- matrix(c(4.3, 7.9, 2.0, 4.4, 1.0, 6.9, 0.1, 2.5), nrow=2)

1.2 Generating model

1.2.1 Using frcs on frbs package

For generating model, we need to define some parameters which are

  • method.type: it is used to determine the learning method. The complete values could be looked at frbsmanual. In this exaple, we are performing fuzzy rule-based classification system with weight factor based on Ishibuchi’s technique (FRBCS.W). So, this parameter is assigned to "FRBCS.W".
  • control: a list containing all arguments, depending on the learning algorithm to use. The complete list of parameters could be seen in frbsmanual. Generally, the following parameters should be defined.
      • num.labels: a positive integer to determine the number of labels (fuzzy terms). The default value is 7.
      • type.mf: The shape of the membership functions. In this case, we choose Gaussian functions.
      • type.tnorm: the type of conjunction operator (t-norm). In this case, we define "MIN" to be type of t-norm.
      • type.snorm: the type of disjunction operator (s-norm). In this case, we define "MAX" to be type of t-norm.
      • type.implication.func: the type of implication function. In this case, we define "ZADEH" to be type of t-norm.
  R> method.type <- "FRBCS.W"
  R> control <- list(num.labels = 3, type.mf = "GAUSSIAN", type.tnorm = "MIN", type.snorm = "MAX", type.implication.func = "ZADEH")
  R> object <- frbs.learn(tra.iris, range.data.input, method.type, control)

Then, a model could be generated using frbs.learn() function. The FRBS model obtained could be summarized as follows.

  R> summary(object)
  The name of model:  sim-0
  Model was trained using:  FRBCS.W
  The names of attributes:  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  The interval of input data:
      Sepal.Length Sepal.Width Petal.Length Petal.Width
  min          4.3         2.0          1.0         0.1
  max          7.9         4.4          6.9         2.5
  Type of FRBS model:
  [1] "FRBCS"
  Type of membership functions:[1] "GAUSSIAN"
  Type of t-norm method:
  [1] "Standard t-norm (min)"
  Type of s-norm method:
  [1] "Standard s-norm"
  Type of implication function:
  [1] "ZADEH"
  The names of fuzzy terms on the input variables:
   [1] "v.1_a.1" "v.1_a.2" "v.1_a.3" "v.2_a.1" "v.2_a.2" "v.2_a.3"
   [7] "v.3_a.1" "v.3_a.2" "v.3_a.3" "v.4_a.1" "v.4_a.2" "v.4_a.3"
  The parameter values of membership function on the input variable (normalized):
       v.1_a.1 v.1_a.2 v.1_a.3 v.2_a.1 v.2_a.2 v.2_a.3 v.3_a.1 v.3_a.2
  [1,]   5.000   5.000   5.000   5.000   5.000   5.000   5.000   5.000
  [2,]   0.000   0.500   1.000   0.000   0.500   1.000   0.000   0.500
  [3,]   0.175   0.175   0.175   0.175   0.175   0.175   0.175   0.175
  [4,]      NA      NA      NA      NA      NA      NA      NA      NA
  [5,]      NA      NA      NA      NA      NA      NA      NA      NA
       v.3_a.3 v.4_a.1 v.4_a.2 v.4_a.3
  [1,]   5.000   5.000   5.000   5.000
  [2,]   1.000   0.000   0.500   1.000
  [3,]   0.175   0.175   0.175   0.175
  [4,]      NA      NA      NA      NA
  [5,]      NA      NA      NA      NA
  The number of fuzzy terms on each variables
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  [1,]            3           3            3           3       3
  The fuzzy IF-THEN rules:
     V1           V2 V3      V4  V5          V6 V7      V8  V9
  1  IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.2 and
  2  IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.1 and
  3  IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.3 and
  4  IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.2 and
  5  IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.2 and
  6  IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.2 and
  7  IF Sepal.Length is v.1_a.3 and Sepal.Width is v.2_a.2 and
  8  IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.1 and
  9  IF Sepal.Length is v.1_a.1 and Sepal.Width is v.2_a.1 and
  10 IF Sepal.Length is v.1_a.3 and Sepal.Width is v.2_a.2 and
  11 IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.1 and
  12 IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.2 and
  13 IF Sepal.Length is v.1_a.2 and Sepal.Width is v.2_a.2 and
  14 IF Sepal.Length is v.1_a.1 and Sepal.Width is v.2_a.1 and
  15 IF Sepal.Length is v.1_a.1 and Sepal.Width is v.2_a.2 and
              V10 V11     V12 V13         V14 V15     V16  V17     V18
  1  Petal.Length  is v.3_a.3 and Petal.Width  is v.4_a.2 THEN Species
  2  Petal.Length  is v.3_a.2 and Petal.Width  is v.4_a.2 THEN Species
  3  Petal.Length  is v.3_a.1 and Petal.Width  is v.4_a.1 THEN Species
  4  Petal.Length  is v.3_a.2 and Petal.Width  is v.4_a.2 THEN Species
  5  Petal.Length  is v.3_a.2 and Petal.Width  is v.4_a.2 THEN Species
  6  Petal.Length  is v.3_a.3 and Petal.Width  is v.4_a.3 THEN Species
  7  Petal.Length  is v.3_a.3 and Petal.Width  is v.4_a.3 THEN Species
  8  Petal.Length  is v.3_a.3 and Petal.Width  is v.4_a.2 THEN Species
  9  Petal.Length  is v.3_a.2 and Petal.Width  is v.4_a.2 THEN Species
  10 Petal.Length  is v.3_a.3 and Petal.Width  is v.4_a.2 THEN Species
  11 Petal.Length  is v.3_a.2 and Petal.Width  is v.4_a.2 THEN Species
  12 Petal.Length  is v.3_a.1 and Petal.Width  is v.4_a.1 THEN Species
  13 Petal.Length  is v.3_a.2 and Petal.Width  is v.4_a.3 THEN Species
  14 Petal.Length  is v.3_a.2 and Petal.Width  is v.4_a.2 THEN Species
  15 Petal.Length  is v.3_a.1 and Petal.Width  is v.4_a.1 THEN Species
     V19 V20
  1   is   3
  2   is   3
  3   is   1
  4   is   2
  5   is   3
  6   is   3
  7   is   3
  8   is   3
  9   is   3
  10  is   3
  11  is   2
  12  is   1
  13  is   3
  14  is   2
  15  is   1
  The weight of the rules
             [,1]
   [1,] 0.7984136
   [2,] 0.7984136
   [3,] 0.3318973
   [4,] 0.3696890
   [5,] 0.7984136
   [6,] 0.7984136
   [7,] 0.7984136
   [8,] 0.7984136
   [9,] 0.7984136
  [10,] 0.7984136
  [11,] 0.3696890
  [12,] 0.3318973
  [13,] 0.7984136
  [14,] 0.3696890
  [15,] 0.3318973
1.2.2 Using classification tree on tree package

As we mentioned before, we compare steps of our method with other package in CRAN. In this example, we use tree package. The following code is for generating a model using classification tree.

  R> object.tree <- tree(Species ~., tra.iris)

1.3 Predicting the new data

The following code shows how to predict new data.

1.3.1 Using FRBCS.W on frbs package
  R> res.test <- predict(object, tst.iris)
1.3.2 Using classification tree on tree package
  R> pred.tree <- predict(object.tree, tst.iris)

1.4 Printing the results of frbs and tree

After that, we can print and compare their outputs:

  R> benchmark <- cbind(real.iris, res.test, round(pred.tree))
  R> colnames(benchmark) <- c("real", "frcs", "tree")
  R> print(benchmark)
      real frcs tree
  98     2    3    2
  40     1    1    1
  119    3    3    3
  7      1    1    1
  104    3    3    3
  100    2    3    2
  72     2    3    2
  77     2    3    2
  86     2    3    2
  30     1    1    1
  69     2    3    2
  103    3    3    3
  95     2    3    2
  133    3    3    3
  106    3    3    3
  68     2    3    2
  8      1    1    1
  150    3    3    3
  54     2    3    2
  42     1    1    1
  124    3    3    3
  125    3    3    3
  35     1    1    1
  16     1    1    1
  51     2    3    2
  53     2    3    2
  112    3    3    3
  123    3    3    3
  111    3    3    3
  71     2    3    3
  134    3    3    3
  49     1    1    1
  129    3    3    3
  3      1    1    1
  43     1    1    1
  139    3    3    3
  12     1    1    1
  37     1    1    1
  9      1    1    1
  114    3    3    3
  146    3    3    3
  45     1    1    1
  90     2    3    2
  110    3    3    3
  148    3    3    3

  R> #### Measure error for both methods
  R> err.frbcs = 100*sum(real.iris!=res.test)/nrow(real.iris)
  R> err.tree = 100*sum(real.iris!=pred.tree)/nrow(real.iris)
  R> print("FRBCS.W: percentage Error on Iris")
  [1] "FRBCS.W: percentage Error on Iris"

  R> print(err.frbcs) 
  [1] 28.88889

  R> print("tree: percentage Error on Iris") 
  [1] "tree: percentage Error on Iris"

  R> print(err.tree) 
  [1] 31.11111

So, it can be seen that their steps are quite similar between frbs and tree package.

2 The Regression Problem

Here, we would like to show how to simulate frbs comparing with RSNNS package on regression problem. In this example, while mlp on RSNNS is used, we just illustrate one of the methods on frbs, ANFIS. Both of the methods simulate using the Gas Furnance dataset. The Gas Furnance dataset is taken from Box and Jenkins. We arrage the data to be 292 consecutive values of methane at time (t - 4), and the CO2 produced in a furnance at time (t - 1) as input variables, with the produced CO2 at time (t) as an output variable. So, each training data point consists of [u(t - 4), y(t - 1), y(t)], where u is methane and y is CO2. Difference with ANFIS, mlp uses the original data for training. So, it contains just two colomns which are methane and CO2 as the input and output variable, respectively.

2.1 Preprocessing Data

2.1.1 ANFIS on frbs package

The main point in this section is we are separating the data into training and predicting data.

  R> ## preprocessing frbs
  R> data(frbsData)
  R> data.train <- frbsData$GasFurnance.dt[1 : 204, ]
  R> data.fit <- data.train[, 1 : 2]
  R> data.tst <- frbsData$GasFurnance.dt[205 : 292, 1 : 2]
  R> real.val <- matrix(frbsData$GasFurnance.dt[205 : 292, 3], ncol = 1)
  R> range.data<-matrix(c(-2.716, 2.834, 45.6, 60.5, 45.6, 60.5), nrow=2)
2.1.2 mlp on RSNNS package

For mlp method, beside separating the data as like in frbs, we normalize them.

  R> ## normalizing data on mlp
  R> GF <- normTrainingAndTestSet(list(inputsTrain = data.fit, targetsTrain = matrix(data.train[, 3], ncol = 1), inputsTest = data.tst, targetsTest = real.val) )

2.2 Generating model

2.2.1 ANFIS on frbs package

We must choose a type of method to be "ANFIS" for using the ANFIS method and we determine the values of parameters as in control. Then, frbs.learn is called to generate object as our model.

  R> method.type <- "ANFIS"
  R> control <- list(num.labels = 5, max.iter = 100, step.size = 0.01, type.tnorm = "MIN", type.snorm = "MAX", type.implication.func = "ZADEH", name = "GasFur")
  R> object.ANFIS <- frbs.learn(data.train, range.data, method.type, control)
2.2.2 mlp on RSNNS package

The following code is to generate model/object using mlp function.

  R> object.mlp <- mlp(GF$inputsTrain, GF$targetsTrain, size=5, learnFuncParams=c(0.1),
  +  maxit=350, inputsTest=GF$inputsTest, targetsTest=GF$targetsTest, linOut=TRUE)

2.3 Predicting phase

2.3.1 ANFIS on frbs package
  R> pred.ANFIS <- predict(object.ANFIS, data.tst)
2.3.2 mlp on RSNNS package
  R> pred.mlp <- predict(object.mlp, GF$inputsTest)

So, it can be seen that both methods have similar way to predict using the new data.

2.4 Printing the results of ANFIS and mlp

After that, we can print and compare their outputs. Note that, here we do not make decision that one of the methods is better than the other since we could get difference results if we give other parameter values on each the methods.

  R> bench <- cbind(real.val, pred.ANFIS, pred.mlp)
  R> colnames(bench) <- c("real", "ANFIS", "mlp")
  R> ## print(bench)
  R>
  R> residuals.ANFIS <- (real.val - pred.ANFIS)
  R> residuals.mlp <- (real.val - pred.mlp)
  R> MSE.ANFIS <- mean(residuals.ANFIS^2)
  R> RMSE.ANFIS <- sqrt(mean(residuals.ANFIS^2))
  R> SMAPE.ANFIS <- mean(abs(residuals.ANFIS)/(abs(real.val) + abs(pred.ANFIS))/2)*100
  R> MSE.mlp <- mean(residuals.mlp^2)
  R> RMSE.mlp <- sqrt(mean(residuals.mlp^2))
  R> SMAPE.mlp <- mean(abs(residuals.mlp)/(abs(real.val) + abs(pred.mlp))/2)*100
  R> err.ANFIS <- c(MSE.ANFIS, RMSE.ANFIS, SMAPE.ANFIS)
  R> names(err.ANFIS) <- c("MSE", "RMSE", "SMAPE")
  R> err.mlp <- c(MSE.mlp, RMSE.mlp, SMAPE.mlp)
  R> names(err.mlp) <- c("MSE", "RMSE", "SMAPE")
  R> print("ANFIS Error Measurement: ")
  [1] "ANFIS Error Measurement: "

  R> print(err.ANFIS) 
        MSE      RMSE     SMAPE
  0.3426816 0.5853901 0.1903511

  R> print("mlp Error Measurement: ") 
  [1] "mlp Error Measurement: "

  R> print(err.mlp)
        MSE      RMSE     SMAPE
  0.4620850 0.6797684 0.205370