Chapter 3 rTorch vs PyTorch: What’s different
This chapter will explain the main differences between PyTorch
and rTorch
. Most of the things work directly in PyTorch
but we need to be aware of some minor differences when working with rTorch. Here is a review of existing methods.
Let’s start by loading rTorch
:
library(rTorch)
3.1 Calling objects from PyTorch
We use the dollar sign or $
to call a class, function or method from the rTorch
modules. In this case, from the torch
module:
$tensor(c(1, 2, 3)) torch
#> tensor([1., 2., 3.])
In Python, what we do is using the dot to separate the sub-members of an object:
import torch
1, 2, 3]) torch.tensor([
#> tensor([1, 2, 3])
3.2 Call modules and functions from torch
library(rTorch)
# these are the equivalents of the Python import module
<- torch$nn
nn <- torchvision$transforms
transforms <- torchvision$datasets
dsets
$tensor(c(1, 2, 3)) torch
#> tensor([1., 2., 3.])
The code above is equivalent to writing this code in Python:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
1, 2, 3]) torch.tensor([
#> tensor([1, 2, 3])
Then we can proceed to extract classes, methods and functions from the nn
, transforms
, and dsets
objects. In this example we use the module torchvision$datasets
and the function transforms$ToTensor()
<- './datasets/mnist_digits'
local_folder = torchvision$datasets$MNIST(root = local_folder,
train_dataset train = TRUE,
transform = transforms$ToTensor(),
download = TRUE)
train_dataset
#> Dataset MNIST
#> Number of datapoints: 60000
#> Root location: ./datasets/mnist_digits
#> Split: Train
#> StandardTransform
#> Transform: ToTensor()
3.3 Show the attributes (methods) of a class or PyTorch object
Sometimes we are interested in knowing the internal components of a class. In that case, we use the reticulate function py_list_attributes()
.
In this example, we want to show the attributes of train_dataset
:
::py_list_attributes(train_dataset) reticulate
#> [1] "__add__" "__class__" "__delattr__"
#> [4] "__dict__" "__dir__" "__doc__"
#> [7] "__eq__" "__format__" "__ge__"
#> [10] "__getattribute__" "__getitem__" "__gt__"
#> [13] "__hash__" "__init__" "__init_subclass__"
#> [16] "__le__" "__len__" "__lt__"
#> [19] "__module__" "__ne__" "__new__"
#> [22] "__reduce__" "__reduce_ex__" "__repr__"
#> [25] "__setattr__" "__sizeof__" "__str__"
#> [28] "__subclasshook__" "__weakref__" "_check_exists"
#> [31] "_format_transform_repr" "_repr_indent" "class_to_idx"
#> [34] "classes" "data" "download"
#> [37] "extra_repr" "processed_folder" "raw_folder"
#> [40] "resources" "root" "target_transform"
#> [43] "targets" "test_data" "test_file"
#> [46] "test_labels" "train" "train_data"
#> [49] "train_labels" "training_file" "transform"
#> [52] "transforms"
Knowing the internal methods of a class could be useful when we want to refer to a specific property of such class. For example, from the list above, we know that the object train_dataset
has an attribute __len__
. We can call it like this:
$`__len__`() train_dataset
#> [1] 60000
3.4 How to iterate through datasets
3.4.1 Enumeration
Given the following training dataset x_train
, we want to find the number of elements of the tensor. We start by entering a numpy
array, which then will convert to a tensor with the PyTorch function from_numpy()
:
<- array(c(3.3, 4.4, 5.5, 6.71, 6.93, 4.168,
x_train_r 9.779, 6.182, 7.59, 2.167, 7.042,
10.791, 5.313, 7.997, 3.1), dim = c(15,1))
<- r_to_py(x_train_r)
x_train_np <- torch$from_numpy(x_train_np) # convert to tensor
x_train_ <- x_train_$type(torch$FloatTensor) # make it a a FloatTensor
x_train print(x_train$dtype)
print(x_train)
#> torch.float32
#> tensor([[ 3.3000],
#> [ 4.4000],
#> [ 5.5000],
#> [ 6.7100],
#> [ 6.9300],
#> [ 4.1680],
#> [ 9.7790],
#> [ 6.1820],
#> [ 7.5900],
#> [ 2.1670],
#> [ 7.0420],
#> [10.7910],
#> [ 5.3130],
#> [ 7.9970],
#> [ 3.1000]])
length
is similar to nelement
for number of elements:
length(x_train)
$nelement() # number of elements in the tensor x_train
#> [1] 15
#> [1] 15
3.4.2 Using enumerate
and iterate
= import_builtins()
py
= py$enumerate(x_train)
enum_x_train
enum_x_train
$len(x_train) py
#> <enumerate>
#> [1] 15
If we directly use iterate
over the enum_x_train
object, we get an R list with the index and the value of the 1D
tensor:
= iterate(enum_x_train, simplify = TRUE)
xit xit
#> [[1]]
#> [[1]][[1]]
#> [1] 0
#>
#> [[1]][[2]]
#> tensor([3.3000])
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 1
#>
#> [[2]][[2]]
#> tensor([4.4000])
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 2
#>
#> [[3]][[2]]
#> tensor([5.5000])
#>
#>
#> [[4]]
#> [[4]][[1]]
#> [1] 3
#>
#> [[4]][[2]]
#> tensor([6.7100])
#>
#>
#> [[5]]
#> [[5]][[1]]
#> [1] 4
#>
#> [[5]][[2]]
#> tensor([6.9300])
#>
#>
#> [[6]]
#> [[6]][[1]]
#> [1] 5
#>
#> [[6]][[2]]
#> tensor([4.1680])
#>
#>
#> [[7]]
#> [[7]][[1]]
#> [1] 6
#>
#> [[7]][[2]]
#> tensor([9.7790])
#>
#>
#> [[8]]
#> [[8]][[1]]
#> [1] 7
#>
#> [[8]][[2]]
#> tensor([6.1820])
#>
#>
#> [[9]]
#> [[9]][[1]]
#> [1] 8
#>
#> [[9]][[2]]
#> tensor([7.5900])
#>
#>
#> [[10]]
#> [[10]][[1]]
#> [1] 9
#>
#> [[10]][[2]]
#> tensor([2.1670])
#>
#>
#> [[11]]
#> [[11]][[1]]
#> [1] 10
#>
#> [[11]][[2]]
#> tensor([7.0420])
#>
#>
#> [[12]]
#> [[12]][[1]]
#> [1] 11
#>
#> [[12]][[2]]
#> tensor([10.7910])
#>
#>
#> [[13]]
#> [[13]][[1]]
#> [1] 12
#>
#> [[13]][[2]]
#> tensor([5.3130])
#>
#>
#> [[14]]
#> [[14]][[1]]
#> [1] 13
#>
#> [[14]][[2]]
#> tensor([7.9970])
#>
#>
#> [[15]]
#> [[15]][[1]]
#> [1] 14
#>
#> [[15]][[2]]
#> tensor([3.1000])
3.4.3 Using a for-loop
to iterate
Another way of iterating through a dataset that you will see a lot in the PyTorch tutorials is a loop
through the length of the dataset. In this case, x_train
. We are using cat()
for the index (an integer), and print()
for the tensor, since cat
doesn’t know how to deal with tensors:
# reset the iterator
= py$enumerate(x_train)
enum_x_train
for (i in 1:py$len(x_train)) {
<- iter_next(enum_x_train) # next item
obj cat(obj[[1]], "\t") # 1st part or index
print(obj[[2]]) # 2nd part or tensor
}
#> 0 tensor([3.3000])
#> 1 tensor([4.4000])
#> 2 tensor([5.5000])
#> 3 tensor([6.7100])
#> 4 tensor([6.9300])
#> 5 tensor([4.1680])
#> 6 tensor([9.7790])
#> 7 tensor([6.1820])
#> 8 tensor([7.5900])
#> 9 tensor([2.1670])
#> 10 tensor([7.0420])
#> 11 tensor([10.7910])
#> 12 tensor([5.3130])
#> 13 tensor([7.9970])
#> 14 tensor([3.1000])
Similarly, if we want the scalar values but not as tensor, then we will need to use item()
.
# reset the iterator
= py$enumerate(x_train)
enum_x_train
for (i in 1:py$len(x_train)) {
<- iter_next(enum_x_train) # next item
obj cat(obj[[1]], "\t") # 1st part or index
print(obj[[2]]$item()) # 2nd part or tensor
}
#> 0 [1] 3.3
#> 1 [1] 4.4
#> 2 [1] 5.5
#> 3 [1] 6.71
#> 4 [1] 6.93
#> 5 [1] 4.17
#> 6 [1] 9.78
#> 7 [1] 6.18
#> 8 [1] 7.59
#> 9 [1] 2.17
#> 10 [1] 7.04
#> 11 [1] 10.8
#> 12 [1] 5.31
#> 13 [1] 8
#> 14 [1] 3.1
We will find very frequently this kind of iterators when we read a dataset read by
torchvision
. There are several different ways to iterate through these objects as you will find.
3.5 Zero gradient
The zero gradient was one of the most difficult to implement in R if we don’t pay attention to the content of the objects carrying the weights and biases. This happens when the algorithm written in PyTorch is not immediately translatable to rTorch. This can be appreciated in this example.
We are using the same seed in the PyTorch and rTorch versions, so, we could compare the results.
3.5.1 Version in Python
import numpy as np
import torch
0) # reproducible
torch.manual_seed(
# Input (temp, rainfall, humidity)
#> <torch._C.Generator object at 0x7f52463973d0>
= np.array([[73, 67, 43],
inputs 91, 88, 64],
[87, 134, 58],
[102, 43, 37],
[69, 96, 70]], dtype='float32')
[
# Targets (apples, oranges)
= np.array([[56, 70],
targets 81, 101],
[119, 133],
[22, 37],
[103, 119]], dtype='float32')
[
# Convert inputs and targets to tensors
= torch.from_numpy(inputs)
inputs = torch.from_numpy(targets)
targets
# random weights and biases
= torch.randn(2, 3, requires_grad=True)
w = torch.randn(2, requires_grad=True)
b
# function for the model
def model(x):
= w.t()
wt = x @ w.t()
mm return x @ w.t() + b # @ represents matrix multiplication in PyTorch
# MSE loss function
def mse(t1, t2):
= t1 - t2
diff return torch.sum(diff * diff) / diff.numel()
# Running all together
# Train for 100 epochs
for i in range(100):
= model(inputs)
preds = mse(preds, targets)
loss
loss.backward()with torch.no_grad():
-= w.grad * 0.00001
w -= b.grad * 0.00001
b = w.grad.zero_()
w_gz = b.grad.zero_()
b_gz
# Calculate loss
= model(inputs)
preds = mse(preds, targets)
loss print("Loss: ", loss)
# predictions
#> Loss: tensor(1270.1233, grad_fn=<DivBackward0>)
print("\nPredictions:")
#>
#> Predictions:
preds
# Targets
#> tensor([[ 69.3122, 80.2639],
#> [ 73.7528, 97.2381],
#> [118.3933, 124.7628],
#> [ 89.6111, 93.0286],
#> [ 47.3014, 80.6467]], grad_fn=<AddBackward0>)
print("\nTargets:")
#>
#> Targets:
targets
#> tensor([[ 56., 70.],
#> [ 81., 101.],
#> [119., 133.],
#> [ 22., 37.],
#> [103., 119.]])
3.5.2 Version in R
library(rTorch)
$manual_seed(0)
torch
= torch$device('cpu')
device # Input (temp, rainfall, humidity)
= np$array(list(list(73, 67, 43),
inputs list(91, 88, 64),
list(87, 134, 58),
list(102, 43, 37),
list(69, 96, 70)), dtype='float32')
# Targets (apples, oranges)
= np$array(list(list(56, 70),
targets list(81, 101),
list(119, 133),
list(22, 37),
list(103, 119)), dtype='float32')
# Convert inputs and targets to tensors
= torch$from_numpy(inputs)
inputs = torch$from_numpy(targets)
targets
# random numbers for weights and biases. Then convert to double()
$set_default_dtype(torch$float64)
torch
= torch$randn(2L, 3L, requires_grad=TRUE) #$double()
w = torch$randn(2L, requires_grad=TRUE) #$double()
b
<- function(x) {
model <- w$t()
wt return(torch$add(torch$mm(x, wt), b))
}
# MSE loss
= function(t1, t2) {
mse <- torch$sub(t1, t2)
diff <- torch$sum(torch$mul(diff, diff))
mul return(torch$div(mul, diff$numel()))
}
# Running all together
# Adjust weights and reset gradients
for (i in 1:100) {
= model(inputs)
preds = mse(preds, targets)
loss $backward()
losswith(torch$no_grad(), {
$data <- torch$sub(w$data, torch$mul(w$grad, torch$scalar_tensor(1e-5)))
w$data <- torch$sub(b$data, torch$mul(b$grad, torch$scalar_tensor(1e-5)))
b
$grad$zero_()
w$grad$zero_()
b
})
}
# Calculate loss
= model(inputs)
preds = mse(preds, targets)
loss cat("Loss: "); print(loss)
# predictions
cat("\nPredictions:\n")
preds
# Targets
cat("\nTargets:\n")
targets
#> <torch._C.Generator>
#> Loss: tensor(1270.1237, grad_fn=<DivBackward0>)
#>
#> Predictions:
#> tensor([[ 69.3122, 80.2639],
#> [ 73.7528, 97.2381],
#> [118.3933, 124.7628],
#> [ 89.6111, 93.0286],
#> [ 47.3013, 80.6467]], grad_fn=<AddBackward0>)
#>
#> Targets:
#> tensor([[ 56., 70.],
#> [ 81., 101.],
#> [119., 133.],
#> [ 22., 37.],
#> [103., 119.]])
Notice that while we are in Python, the tensor operation, gradient (∇) of the weights w times the Learning Rate α, is:
w=−w+∇wα
In Python, it is a very straight forwward and clean code:
-= w.grad * 1e-5 w
In R, without generics, it shows a little bit more convoluted:
$data <- torch$sub(w$data, torch$mul(w$grad, torch$scalar_tensor(1e-5))) w
3.6 R generics for PyTorch functions
Which why we simplified these common operations using the R generic function. When we use the generic methods from rTorch the operation looks much neater.
$data <- w$data - w$grad * 1e-5 w
The following two expressions are equivalent, with the first being the long version natural way of doing it in PyTorch. The second is using the generics in R for subtraction, multiplication and scalar conversion.
$data <- torch$sub(param$data,
param$mul(param$grad$float(),
torch$scalar_tensor(learning_rate)))
torch}
$data <- param$data - param$grad * learning_rate param