25 Vehicles classiification with Decision Trees

Datasets: Vehicle (mlbench)
Algorithms:
- Decision Trees
Instructions: book “Applied Predictive Modeling Techniques”, Lewis, N.D.

25.1 Load packages

library(tree)
library(mlbench)

data(Vehicle)
str(Vehicle)
#> 'data.frame':    846 obs. of  19 variables:
#>  $ Comp        : num  95 91 104 93 85 107 97 90 86 93 ...
#>  $ Circ        : num  48 41 50 41 44 57 43 43 34 44 ...
#>  $ D.Circ      : num  83 84 106 82 70 106 73 66 62 98 ...
#>  $ Rad.Ra      : num  178 141 209 159 205 172 173 157 140 197 ...
#>  $ Pr.Axis.Ra  : num  72 57 66 63 103 50 65 65 61 62 ...
#>  $ Max.L.Ra    : num  10 9 10 9 52 6 6 9 7 11 ...
#>  $ Scat.Ra     : num  162 149 207 144 149 255 153 137 122 183 ...
#>  $ Elong       : num  42 45 32 46 45 26 42 48 54 36 ...
#>  $ Pr.Axis.Rect: num  20 19 23 19 19 28 19 18 17 22 ...
#>  $ Max.L.Rect  : num  159 143 158 143 144 169 143 146 127 146 ...
#>  $ Sc.Var.Maxis: num  176 170 223 160 241 280 176 162 141 202 ...
#>  $ Sc.Var.maxis: num  379 330 635 309 325 957 361 281 223 505 ...
#>  $ Ra.Gyr      : num  184 158 220 127 188 264 172 164 112 152 ...
#>  $ Skew.Maxis  : num  70 72 73 63 127 85 66 67 64 64 ...
#>  $ Skew.maxis  : num  6 9 14 6 9 5 13 3 2 4 ...
#>  $ Kurt.maxis  : num  16 14 9 10 11 9 1 3 14 14 ...
#>  $ Kurt.Maxis  : num  187 189 188 199 180 181 200 193 200 195 ...
#>  $ Holl.Ra     : num  197 199 196 207 183 183 204 202 208 204 ...
#>  $ Class       : Factor w/ 4 levels "bus","opel","saab",..: 4 4 3 4 1 1 1 4 4 3 ...

summary(Vehicle[1])
#>       Comp      
#>  Min.   : 73.0  
#>  1st Qu.: 87.0  
#>  Median : 93.0  
#>  Mean   : 93.7  
#>  3rd Qu.:100.0  
#>  Max.   :119.0

summary(Vehicle[2])
#>       Circ     
#>  Min.   :33.0  
#>  1st Qu.:40.0  
#>  Median :44.0  
#>  Mean   :44.9  
#>  3rd Qu.:49.0  
#>  Max.   :59.0

attributes(Vehicle$Class)
#> $levels
#> [1] "bus"  "opel" "saab" "van" 
#> 
#> $class
#> [1] "factor"

25.2 Prepare data

set.seed(107)
N = nrow(Vehicle)
train <- sample(1:N, 500, FALSE)

# training and test sets
trainset <- Vehicle[train,]
testset  <- Vehicle[-train,]

25.3 Estimate the decision tree

fit <- tree(Class ~., data = trainset, split = "deviance")
fit
#> node), split, n, deviance, yval, (yprob)
#>       * denotes terminal node
#> 
#>   1) root 500 1000 opel ( 0 0 0 0 )  
#>     2) Elong < 41.5 215  500 saab ( 0 0 0 0 )  
#>       4) Max.L.Ra < 7.5 51   50 bus ( 1 0 0 0 )  
#>         8) Comp < 93.5 12   20 bus ( 0 0 0 0 )  
#>          16) Pr.Axis.Ra < 67.5 7    8 saab ( 0 0 1 0 ) *
#>          17) Pr.Axis.Ra > 67.5 5    0 bus ( 1 0 0 0 ) *
#>         9) Comp > 93.5 39    9 bus ( 1 0 0 0 ) *
#>       5) Max.L.Ra > 7.5 164  200 opel ( 0 1 0 0 )  
#>        10) Sc.Var.maxis < 723 149  200 saab ( 0 0 1 0 )  
#>          20) Comp < 109.5 137  200 opel ( 0 1 0 0 ) *
#>          21) Comp > 109.5 12    0 saab ( 0 0 1 0 ) *
#>        11) Sc.Var.maxis > 723 15    7 opel ( 0 1 0 0 ) *
#>     3) Elong > 41.5 285  700 van ( 0 0 0 0 )  
#>       6) Sc.Var.maxis < 305.5 116  200 van ( 0 0 0 1 )  
#>        12) Max.L.Rect < 128.5 40   90 saab ( 0 0 0 0 )  
#>          24) Scat.Ra < 120.5 15   30 van ( 0 0 0 1 ) *
#>          25) Scat.Ra > 120.5 25   30 saab ( 0 0 1 0 ) *
#>        13) Max.L.Rect > 128.5 76   90 van ( 0 0 0 1 )  
#>          26) Max.L.Rect < 138.5 38   60 van ( 0 0 0 1 )  
#>            52) Circ < 37.5 17   10 van ( 0 0 0 1 ) *
#>            53) Circ > 37.5 21   40 opel ( 0 0 0 0 ) *
#>          27) Max.L.Rect > 138.5 38   20 van ( 0 0 0 1 ) *
#>       7) Sc.Var.maxis > 305.5 169  400 bus ( 0 0 0 0 )  
#>        14) Max.L.Ra < 8.5 116  200 bus ( 1 0 0 0 )  
#>          28) D.Circ < 76.5 97  100 bus ( 1 0 0 0 )  
#>            56) Skew.maxis < 10.5 87   70 bus ( 1 0 0 0 )  
#>             112) Max.L.Rect < 134.5 12   20 bus ( 0 0 0 0 ) *
#>             113) Max.L.Rect > 134.5 75   20 bus ( 1 0 0 0 ) *
#>            57) Skew.maxis > 10.5 10   20 opel ( 0 0 0 0 ) *
#>          29) D.Circ > 76.5 19   30 opel ( 0 1 0 0 ) *
#>        15) Max.L.Ra > 8.5 53   20 van ( 0 0 0 1 ) *

# fit <- tree(Class ~., data = Vehicle[train,], split ="deviance")
# fit

We use deviance as the splitting criteria, a common alternative is to use split=“gini”.

At each branch of the tree (after root) we see in order: 1. The branch number (e.g. in this case 1,2,14 and 15); 2. the split (e.g. Elong < 41.5); 3. the number of samples going along that split (e.g. 229); 4. the deviance associated with that split (e.g. 489.1); 5. the predicted class (e.g. opel); 6. the associated probabilities (e.g. ( 0.222707 0.410480 0.366812 0.000000 ); 7. and for a terminal node (or leaf), the symbol "*".

summary(fit)
#> 
#> Classification tree:
#> tree(formula = Class ~ ., data = trainset, split = "deviance")
#> Variables actually used in tree construction:
#>  [1] "Elong"        "Max.L.Ra"     "Comp"         "Pr.Axis.Ra"   "Sc.Var.maxis"
#>  [6] "Max.L.Rect"   "Scat.Ra"      "Circ"         "D.Circ"       "Skew.maxis"  
#> Number of terminal nodes:  16 
#> Residual mean deviance:  0.943 = 456 / 484 
#> Misclassification error rate: 0.252 = 126 / 500

Notice that summary(fit) shows: 1. The type of tree, in this case a Classification tree; 2. the formula used to fit the tree; 3. the variables used to fit the tree; 4. the number of terminal nodes in this case 15; 5. the residual mean deviance - 0.9381; 6. the misclassification error rate 0.232 or 23.2%.

plot(fit); text(fit)

25.4 Assess model

Unfortunately, classification trees have a tendency to overfit the data. One approach to reduce this risk is to use cross-validation. For each hold out sample we fit the model and note at what level the tree gives the best results (using deviance or the misclassification rate). Then we hold out a different sample and repeat. This can be carried out using the cv.tree() function. We use a leave-one-out cross-validation using the misclassification rate and deviance (FUN=prune.misclass, followed by FUN=prune.tree).

fitM.cv <- cv.tree(fit, K=346, FUN = prune.misclass)
fitP.cv <- cv.tree(fit, K=346, FUN = prune.tree)

The results are plotted out side by side in Figure 1.2. The jagged lines shows where the minimum deviance / misclassification occurred with the cross-validated tree. Since the cross validated misclassification and deviance both reach their minimum close to the number of branches in the original fitted tree there is little to be gained from pruning this tree

par(mfrow = c(1, 2))
plot(fitM.cv)
plot(fitP.cv)

25.5 Make predictions

We use the validation data set and the fitted decision tree to predict vehicle classes; then we display the confusion matrix and calculate the error rate of the fitted tree. Overall, the model has an error rate of 32%.

testLabels <- Vehicle$Class[-train]
testLabels
#>   [1] van  bus  bus  van  van  bus  bus  saab opel bus  van  saab van  saab saab
#>  [16] van  saab opel van  saab saab saab bus  bus  saab opel bus  opel bus  opel
#>  [31] van  opel opel saab saab bus  bus  bus  van  van  saab opel bus  opel van 
#>  [46] opel saab bus  van  bus  opel van  saab bus  opel bus  opel opel van  bus 
#>  [61] van  saab opel bus  van  saab opel opel saab saab saab opel bus  van  bus 
#>  [76] opel bus  saab bus  bus  bus  opel opel van  saab bus  bus  bus  van  saab
#>  [91] opel van  van  bus  bus  opel bus  opel saab opel bus  opel bus  saab van 
#> [106] van  saab saab bus  van  opel van  saab opel saab saab van  van  van  van 
#> [121] bus  bus  opel bus  bus  van  saab bus  opel bus  bus  bus  bus  opel van 
#> [136] saab saab bus  opel van  bus  saab bus  van  bus  opel van  saab opel saab
#> [151] opel van  saab van  saab opel bus  van  bus  saab saab opel opel bus  bus 
#> [166] opel van  van  bus  van  van  saab bus  saab opel saab opel bus  bus  bus 
#> [181] saab bus  opel opel saab saab saab van  van  opel opel van  van  opel bus 
#> [196] saab bus  van  opel opel bus  bus  bus  opel saab opel van  bus  opel opel
#> [211] saab opel bus  opel opel opel van  opel van  saab saab van  saab saab saab
#> [226] saab van  van  van  saab bus  van  van  bus  saab opel saab saab opel saab
#> [241] saab saab saab van  saab opel bus  saab bus  opel opel opel saab bus  van 
#> [256] opel saab opel bus  bus  saab van  opel bus  saab van  opel saab saab saab
#> [271] saab van  opel bus  bus  bus  opel saab saab saab van  saab bus  opel saab
#> [286] van  opel bus  saab saab opel opel van  saab bus  opel bus  van  van  opel
#> [301] bus  bus  saab bus  van  saab bus  van  saab van  opel bus  bus  opel saab
#> [316] opel bus  bus  saab van  saab saab bus  opel opel opel bus  saab bus  van 
#> [331] bus  van  saab opel saab van  opel opel van  bus  saab saab van  saab opel
#> [346] saab
#> Levels: bus opel saab van

# Confusion Matrix
pred <- predict(fit, newdata = testset)
# find column whih has the maximum of all rows 
pred.class <- colnames(pred)[max.col(pred, ties.method = c("random"))]
cm <- table(testLabels, pred.class, 
      dnn = c("Observed Class", "Predicted Class"))
cm
#>               Predicted Class
#> Observed Class bus opel saab van
#>           bus   85    1    1   5
#>           opel   3   70   10   2
#>           saab   7   67   14   7
#>           van    1    4    5  64

# Sensitivity
sum(diag(cm)) / sum(cm)
#> [1] 0.673

# pred <- predict(fit, newdata = Vehicle[-train,])
# pred.class <- colnames(pred)[max.col(pred, ties.method = c("random"))]
# table(Vehicle$Class[-train], pred.class, 
#       dnn = c("Observed Class", "Predicted Class"))

error_rate = (1 - sum(pred.class == testset) / nrow(testset))
round(error_rate, 3)
#> [1] 0.327

# error_rate = (1 - sum(pred.class == Vehicle$Class[-train])/346)
# round(error_rate,3)

24 Naive Bayes for SMS spam

26 Applying Naive-Bayes on the Titanic case