Chapter 2 PyTorch and NumPy

Last update: Thu Nov 19 14:20:26 2020 -0600 (562e6f2c5)

2.1 PyTorch modules in rTorch

2.1.1 torchvision

This is an example of using the torchvision module. With torchvision and its dataset set of function, we could download any of the popular datasets for machine learning made available by PyTorch. In this example, we will be downloading the training dataset of the MNIST handwritten digits. There are 60,000 images in the training set and 10,000 images in the test set. The images will download on the folder ./datasets, or any other you want, which can be set with the parameter root.

#> Dataset MNIST
#>     Number of datapoints: 60000
#>     Root location: ./datasets/mnist_digits
#>     Split: Train
#>     StandardTransform
#> Transform: ToTensor()

You can do similarly for the test dataset if you set the flag train = FALSE. The test dataset has only 10,000 images.

#> Dataset MNIST
#>     Number of datapoints: 10000
#>     Root location: ./datasets/mnist_digits
#>     Split: Test
#>     StandardTransform
#> Transform: ToTensor()

2.1.2 numpy

numpy is automatically installed when PyTorch is. There is some interdependence between both. Anytime that we need to do some transformation that is not available in PyTorch, we will use numpy. Just keep in mind that numpy does not have support for GPUs; you will have to convert the numpy array to a torch tensor afterwards.

2.2 Common array operations

There are several operations that we could perform with numpy such creating arrays:

Create an array

Create an array:

#> [1] 1 2 3 4

We could do this if we add instead a Python chunk like this:

{python}
import numpy as np

a = np.arange(1, 5)
a
#> array([1, 2, 3, 4])

Create an array of a desired shape:

#>      [,1] [,2] [,3]
#> [1,]    0    1    2
#> [2,]    3    4    5
#> [3,]    6    7    8

Create an array by spelling out its components and type:

#>       [,1] [,2] [,3]
#>  [1,]   73   67   43
#>  [2,]   87  134   58
#>  [3,]  102   43   37
#>  [4,]   73   67   43
#>  [5,]   91   88   64
#>  [6,]  102   43   37
#>  [7,]   69   96   70
#>  [8,]   91   88   64
#>  [9,]  102   43   37
#> [10,]   69   96   70

We will use the train and test datasets that we loaded with torchvision.

Reshape an array

For the same test dataset that we loaded above from MNIST digits, we will show the image of the handwritten digit and its label or class. Before plotting the image, we need to:

  1. Extract the image and label from the dataset
  2. Convert the tensor to a numpy array
  3. Reshape the tensor as a 2D array
  4. Plot the digit and its label
#> [1] 7
#> [1]  1 28 28
#> [1] 28 28

We are simply using the r-base image function:

Generate a random array in NumPy

#> [1] 100
#> [1] "array"

From the classes, we can tell that the numpy arrays are automatically converted to R arrays. Let’s plot x vs y:

2.3 Common tensor operations

Generate random tensors

The same operation can be performed with pure torch tensors:. This is very similar to the example above. The only difference is that this time we are using tensors and not numpy arrays.

#> [1] "torch.Tensor"          "torch._C._TensorBase"  "python.builtin.object"
#> [1] "torch.Tensor"          "torch._C._TensorBase"  "python.builtin.object"

Since the classes are torch tensors, to plot them in R, they first need to be converted to numpy, and then to R:

numpy array to PyTorch tensor

Converting a numpy array to a PyTorch tensor is a very common operation that I have seen in examples using PyTorch. Creating first the array in numpy. and then convert it to a torch tensor.

#>      [,1] [,2] [,3]
#> [1,]    0    0    1
#> [2,]    0    1    1
#> [3,]    1    0    1
#> [4,]    1    1    1

This is another common operation that will find in the PyTorch tutorials: converting a numpy array from a certain type to a tensor of the same type:

#> tensor([[0., 0., 1.],
#>         [0., 1., 1.],
#>         [1., 0., 1.],
#>         [1., 1., 1.]])

2.4 Python built-in functions

To access the Python built-in functions we make use of the package reticulate and the function import_builtins().

Here are part of the built-in functions and operators offered by the R package reticulate. I am using the R function grep() to discard those which carry the keywords Error, or Warning, or Exit.

#>  [1] "abs"                "all"                "any"               
#>  [4] "ascii"              "BaseException"      "bin"               
#>  [7] "bool"               "breakpoint"         "bytearray"         
#> [10] "bytes"              "callable"           "chr"               
#> [13] "classmethod"        "compile"            "complex"           
#> [16] "copyright"          "credits"            "delattr"           
#> [19] "dict"               "dir"                "divmod"            
#> [22] "Ellipsis"           "enumerate"          "eval"              
#> [25] "Exception"          "exec"               "exit"              
#> [28] "False"              "filter"             "float"             
#> [31] "format"             "frozenset"          "getattr"           
#> [34] "globals"            "hasattr"            "hash"              
#> [37] "help"               "hex"                "id"                
#> [40] "input"              "int"                "isinstance"        
#> [43] "issubclass"         "iter"               "KeyboardInterrupt" 
#> [46] "len"                "license"            "list"              
#> [49] "locals"             "map"                "max"               
#> [52] "memoryview"         "min"                "next"              
#> [55] "None"               "NotImplemented"     "object"            
#> [58] "oct"                "open"               "ord"               
#> [61] "pow"                "print"              "property"          
#> [64] "quit"               "range"              "repr"              
#> [67] "reversed"           "round"              "set"               
#> [70] "setattr"            "slice"              "sorted"            
#> [73] "staticmethod"       "StopAsyncIteration" "StopIteration"     
#> [76] "str"                "sum"                "super"             
#> [79] "True"               "tuple"              "type"              
#> [82] "vars"               "zip"

Length of a dataset

Sometimes, we will need the Python len function to find out the length of an object:

#> [1] 60000
#> [1] 10000

Types and instances

Types, instances and classes are important to take decisions on how we will process data that is being read from the datasets. In this example, we want to know if an object is of certain instance:

#> <class 'torchvision.datasets.mnist.MNIST'>
#> [1] TRUE