36 Regression with a neural network

  • Dataset: BostonHousing
  • Algorithms:
    • Neural Network (nnet)
    • Linear Regression
###
### prepare data
###
library(mlbench)
data(BostonHousing)
 
# inspect the range which is 1-50
summary(BostonHousing$medv)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     5.0    17.0    21.2    22.5    25.0    50.0
 
 
##
## model linear regression
##
 
lm.fit <- lm(medv ~ ., data=BostonHousing)
 
lm.predict <- predict(lm.fit)
 
# mean squared error: 21.89483
mean((lm.predict - BostonHousing$medv)^2) 
#> [1] 21.9
 
plot(BostonHousing$medv, lm.predict,
    main="Linear regression predictions vs actual",
    xlab="Actual")
##
## model neural network
##
require(nnet)
#> Loading required package: nnet
 
# scale inputs: divide by 50 to get 0-1 range
nnet.fit <- nnet(medv/50 ~ ., data=BostonHousing, size=2) 
#> # weights:  31
#> initial  value 17.039194 
#> iter  10 value 13.754559
#> iter  20 value 13.537235
#> iter  30 value 13.537183
#> iter  40 value 13.530522
#> final  value 13.529736 
#> converged
 
# multiply 50 to restore original scale
nnet.predict <- predict(nnet.fit)*50 
 
# mean squared error: 16.40581
mean((nnet.predict - BostonHousing$medv)^2) 
#> [1] 66.8
 
plot(BostonHousing$medv, nnet.predict,
    main="Neural network predictions vs actual",
    xlab="Actual")

36.1 Neural Network

Now, let’s use the function train() from the package caret to optimize the neural network hyperparameters decay and size, Also, caret performs resampling to give a better estimate of the error. In this case we scale linear regression by the same value, so the error statistics are directly comparable.

 library(mlbench)
 data(BostonHousing)
 
require(caret)
#> Loading required package: caret
#> Loading required package: lattice
#> Loading required package: ggplot2
 
mygrid <- expand.grid(.decay=c(0.5, 0.1), .size=c(4,5,6))
nnetfit <- train(medv/50 ~ ., data=BostonHousing, method="nnet", maxit=1000, tuneGrid=mygrid, trace=F) 
print(nnetfit)
#> Neural Network 
#> 
#> 506 samples
#>  13 predictor
#> 
#> No pre-processing
#> Resampling: Bootstrapped (25 reps) 
#> Summary of sample sizes: 506, 506, 506, 506, 506, 506, ... 
#> Resampling results across tuning parameters:
#> 
#>   decay  size  RMSE    Rsquared  MAE   
#>   0.1    4     0.0830  0.790     0.0571
#>   0.1    5     0.0814  0.798     0.0559
#>   0.1    6     0.0799  0.806     0.0549
#>   0.5    4     0.0908  0.757     0.0626
#>   0.5    5     0.0897  0.762     0.0622
#>   0.5    6     0.0890  0.766     0.0620
#> 
#> RMSE was used to select the optimal model using the smallest value.
#> The final values used for the model were size = 6 and decay = 0.1.
506 samples
 13 predictors
 
No pre-processing
Resampling: Bootstrap (25 reps) 
 
Summary of sample sizes: 506, 506, 506, 506, 506, 506, ... 
 
Resampling results across tuning parameters:
 
  size  decay  RMSE    Rsquared  RMSE SD  Rsquared SD
  4     0.1    0.0852  0.785     0.00863  0.0406     
  4     0.5    0.0923  0.753     0.00891  0.0436     
  5     0.1    0.0836  0.792     0.00829  0.0396     
  5     0.5    0.0899  0.765     0.00858  0.0399     
  6     0.1    0.0835  0.793     0.00804  0.0318     
  6     0.5    0.0895  0.768     0.00789  0.0344   

36.2 Linear Regression

 lmfit <- train(medv/50 ~ ., data=BostonHousing, method="lm") 
 print(lmfit)
#> Linear Regression 
#> 
#> 506 samples
#>  13 predictor
#> 
#> No pre-processing
#> Resampling: Bootstrapped (25 reps) 
#> Summary of sample sizes: 506, 506, 506, 506, 506, 506, ... 
#> Resampling results:
#> 
#>   RMSE    Rsquared  MAE   
#>   0.0988  0.726     0.0692
#> 
#> Tuning parameter 'intercept' was held constant at a value of TRUE
506 samples
 13 predictors
 
No pre-processing
Resampling: Bootstrap (25 reps) 
 
Summary of sample sizes: 506, 506, 506, 506, 506, 506, ... 
 
Resampling results
 
  RMSE    Rsquared  RMSE SD  Rsquared SD
  0.0994  0.703     0.00741  0.0389    

A tuned neural network has a RMSE of 0.0835 compared to linear regression’s RMSE of 0.0994.