15 Tuning of Support Vector Machine prediction

Datasets: iris
Algorithms:
- Support Vector Machines

https://eight2late.wordpress.com/2017/02/07/a-gentle-introduction-to-support-vector-machines-using-r/

15.1 Support vector machines in R

In this demo we’ll use the svm interface that is implemented in the e1071 R package. This interface provides R programmers access to the comprehensive libsvm library written by Chang and Lin. I’ll use two toy datasets: the famous iris dataset available with the base R package and the sonar dataset from the mlbench package. I won’t describe details of the datasets as they are discussed at length in the documentation that I have linked to. However, it is worth mentioning the reasons why I chose these datasets:

As mentioned earlier, no real life dataset is linearly separable, but the iris dataset is almost so. Consequently, it is a good illustration of using linear SVMs. Although one almost never uses these in practice, I have illustrated their use primarily for pedagogical reasons. The sonar dataset is a good illustration of the benefits of using RBF kernels in cases where the dataset is hard to visualise (60 variables in this case!). In general, one would almost always use RBF (or other nonlinear) kernels in practice.

With that said, let’s get right to it. I assume you have R and RStudio installed. For instructions on how to do this, have a look at the first article in this series. The processing preliminaries – loading libraries, data and creating training and test datasets are much the same as in my previous articles so I won’t dwell on these here. For completeness, however, I’ll list all the code so you can run it directly in R or R studio (a complete listing of the code can be found here):

15.2 SVM on `iris` dataset

15.2.1 Training and test datasets

#load required library
library(e1071)

#load built-in iris dataset
data(iris)

#set seed to ensure reproducible results
set.seed(42)

#split into training and test sets
iris[, "train"] <- ifelse(runif(nrow(iris)) < 0.8, 1, 0)

#separate training and test sets
trainset <- iris[iris$train == 1,]
testset <- iris[iris$train == 0,]

#get column index of train flag
trainColNum <- grep("train", names(trainset))

#remove train flag column from train and test sets
trainset <- trainset[,-trainColNum]
testset <- testset[,-trainColNum]

dim(trainset)
#> [1] 115   5
dim(testset)
#> [1] 35  5

15.2.2 Build the SVM model

#get column index of predicted variable in dataset
typeColNum <- grep("Species", names(iris))

#build model – linear kernel and C-classification (soft margin) with default cost (C=1)
svm_model <- svm(Species~ ., data = trainset, 
                 method = "C-classification", 
                 kernel = "linear")
svm_model
#> 
#> Call:
#> svm(formula = Species ~ ., data = trainset, method = "C-classification", 
#>     kernel = "linear")
#> 
#> 
#> Parameters:
#>    SVM-Type:  C-classification 
#>  SVM-Kernel:  linear 
#>        cost:  1 
#> 
#> Number of Support Vectors:  24

The output from the SVM model show that there are 24 support vectors. If desired, these can be examined using the SV variable in the model – i.e via svm_model$SV.

15.2.3 Support Vectors

# support vectors
svm_model$SV
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 19       -0.2564      1.7668       -1.323      -1.305
#> 42       -1.7006     -1.7045       -1.559      -1.305
#> 45       -0.9785      1.7668       -1.205      -1.171
#> 53        1.1878      0.1469        0.568       0.309
#> 55        0.7064     -0.5474        0.390       0.309
#> 57        0.4657      0.6097        0.450       0.443
#> 58       -1.2192     -1.4730       -0.378      -0.364
#> 69        0.3453     -1.9359        0.331       0.309
#> 71       -0.0157      0.3783        0.509       0.712
#> 73        0.4657     -1.2416        0.568       0.309
#> 78        0.9471     -0.0845        0.627       0.578
#> 84        0.1046     -0.7788        0.686       0.443
#> 85       -0.6174     -0.0845        0.331       0.309
#> 86        0.1046      0.8412        0.331       0.443
#> 99       -0.9785     -1.2416       -0.555      -0.229
#> 107      -1.2192     -1.2416        0.331       0.578
#> 111       0.7064      0.3783        0.686       0.981
#> 117       0.7064     -0.0845        0.922       0.712
#> 124       0.4657     -0.7788        0.568       0.712
#> 130       1.5488     -0.0845        1.099       0.443
#> 138       0.5860      0.1469        0.922       0.712
#> 139       0.1046     -0.0845        0.509       0.712
#> 147       0.4657     -1.2416        0.627       0.847
#> 150      -0.0157     -0.0845        0.686       0.712

The test prediction accuracy indicates that the linear performs quite well on this dataset, confirming that it is indeed near linearly separable. To check performance by class, one can create a confusion matrix as described in my post on random forests. I’ll leave this as an exercise for you. Another point is that we have used a soft-margin classification scheme with a cost C=1. You can experiment with this by explicitly changing the value of C. Again, I’ll leave this for you an exercise.

15.2.4 Predictions on training model

# training set predictions
pred_train <- predict(svm_model, trainset)
mean(pred_train == trainset$Species)
#> [1] 0.983
# [1] 0.9826087

15.2.5 Predictions on test model

# test set predictions
pred_test <-predict(svm_model, testset)
mean(pred_test == testset$Species)
#> [1] 0.914
# [1] 0.9142857

15.2.6 Confusion matrix and Accuracy

# confusion matrix
cm <- table(pred_test, testset$Species)
cm
#>             
#> pred_test    setosa versicolor virginica
#>   setosa         18          0         0
#>   versicolor      0          5         3
#>   virginica       0          0         9

# accuracy
sum(diag(cm)) / sum(cm)
#> [1] 0.914

15.3 SVM with Radial Basis Function kernel. Linear

15.3.1 Training and test sets

#load required library (assuming e1071 is already loaded)
library(mlbench)

#load Sonar dataset
data(Sonar)
#set seed to ensure reproducible results
set.seed(42)
#split into training and test sets
Sonar[, "train"] <- ifelse(runif(nrow(Sonar))<0.8,1,0)

#separate training and test sets
trainset <- Sonar[Sonar$train==1,]
testset <- Sonar[Sonar$train==0,]

#get column index of train flag
trainColNum <- grep("train",names(trainset))
#remove train flag column from train and test sets
trainset <- trainset[,-trainColNum]
testset <- testset[,-trainColNum]

#get column index of predicted variable in dataset
typeColNum <- grep("Class",names(Sonar))

15.3.2 Predictions on Training model

#build model – linear kernel and C-classification with default cost (C=1)
svm_model <- svm(Class~ ., data=trainset, 
                 method="C-classification", 
                 kernel="linear")

#training set predictions
pred_train <-predict(svm_model,trainset)
mean(pred_train==trainset$Class)
#> [1] 0.97

15.3.3 Predictions on test model

#test set predictions
pred_test <-predict(svm_model,testset)
mean(pred_test==testset$Class)
#> [1] 0.605

I’ll leave you to examine the contents of the model. The important point to note here is that the performance of the model with the test set is quite dismal compared to the previous case. This simply indicates that the linear kernel is not appropriate here. Let’s take a look at what happens if we use the RBF kernel with default values for the parameters:

15.4 SVM with Radial Basis Function kernel. Non-linear

15.4.1 Predictions on training model

#build model: radial kernel, default params
svm_model <- svm(Class~ ., data=trainset, 
                 method="C-classification", 
                 kernel="radial")
# print params
svm_model$cost
#> [1] 1
svm_model$gamma
#> [1] 0.0167

#training set predictions
pred_train <-predict(svm_model,trainset)
mean(pred_train==trainset$Class)
#> [1] 0.988

15.4.2 Predictions on test model

#test set predictions
pred_test <-predict(svm_model,testset)
mean(pred_test==testset$Class)
#> [1] 0.767

That’s a pretty decent improvement from the linear kernel. Let’s see if we can do better by doing some parameter tuning. To do this we first invoke tune.svm and use the parameters it gives us in the call to svm:

15.4.3 Tuning of parameters

# find optimal parameters in a specified range
tune_out <- tune.svm(x = trainset[,-typeColNum], 
                     y = trainset[, typeColNum], 
                     gamma = 10^(-3:3), 
                     cost = c(0.01, 0.1, 1, 10, 100, 1000), 
                     kernel = "radial")

#print best values of cost and gamma
tune_out$best.parameters$cost
#> [1] 10
tune_out$best.parameters$gamma
#> [1] 0.01

#build model
svm_model <- svm(Class~ ., data = trainset, 
                 method = "C-classification", 
                 kernel = "radial", 
                 cost = tune_out$best.parameters$cost, 
                 gamma = tune_out$best.parameters$gamma)

15.4.4 Prediction on training model with new parameters

# training set predictions
pred_train <-predict(svm_model,trainset)
mean(pred_train==trainset$Class)
#> [1] 1

15.4.5 Prediction on test model with new parameters

# test set predictions
pred_test <-predict(svm_model,testset)
mean(pred_test==testset$Class)
#> [1] 0.814

Which is fairly decent improvement on the un-optimised case.

15.5 Wrapping up

This bring us to the end of this introductory exploration of SVMs in R. To recap, the distinguishing feature of SVMs in contrast to most other techniques is that they attempt to construct optimal separation boundaries between different categories.

SVMs are quite versatile and have been applied to a wide variety of domains ranging from chemistry to pattern recognition. They are best used in binary classification scenarios. This brings up a question as to where SVMs are to be preferred to other binary classification techniques such as logistic regression. The honest response is, “it depends” – but here are some points to keep in mind when choosing between the two. A general point to keep in mind is that SVM algorithms tend to be expensive both in terms of memory and computation, issues that can start to hurt as the size of the dataset increases.

Given all the above caveats and considerations, the best way to figure out whether an SVM approach will work for your problem may be to do what most machine learning practitioners do: try it out!

14 Imputting missing values with Random Forest

16 Introduction to algorithms for Classification