12 numpy

Best array data manipulation, fast
numpy array allows only single data type, unlike list
Support matrix operation

12.1 Environment Setup

import pandas as pd
import matplotlib.pyplot as plt
import math
pd.set_option( 'display.notebook_repr_html', False)  # render Series and DataFrame as text, not HTML
pd.set_option( 'display.max_column', 10)    # number of columns
pd.set_option( 'display.max_rows', 10)     # number of rows
pd.set_option( 'display.width', 90)        # number of characters per row

12.2 Module Import

import numpy as np
np.__version__

## other modules

#:> '1.19.1'

from datetime import datetime
from datetime import date
from datetime import time

12.3 Data Types

12.3.1 NumPy Data Types

NumPy supports a much greater variety of numerical types than Python does. This makes numpy much more powerful https://www.numpy.org/devdocs/user/basics.types.html

Integer: np.int8, np.int16, np.int32, np.uint8, np.uint16, np.uint32
Float: np.float32, np.float64

12.3.2 int32/64

np.int is actually python standard int

x = np.int(13)
y = int(13)
print( type(x) )

#:> <class 'int'>

print( type(y) )

#:> <class 'int'>

np.int32/64 are NumPy specific

x = np.int32(13)
y = np.int64(13)
print( type(x) )

#:> <class 'numpy.int32'>

print( type(y) )

#:> <class 'numpy.int64'>

12.3.3 float32/64

x = np.float(13)
y = float(13)
print( type(x) )

#:> <class 'float'>

print( type(y) )

#:> <class 'float'>

x = np.float32(13)
y = np.float64(13)
print( type(x) )

#:> <class 'numpy.float32'>

print( type(y) )

#:> <class 'numpy.float64'>

12.3.4 bool

np.bool is actually python standard bool

x = np.bool(True)
print( type(x) )

#:> <class 'bool'>

print( type(True) )

#:> <class 'bool'>

12.3.5 str

np.str is actually python standard str

x = np.str("ali")
print( type(x) )

#:> <class 'str'>

x = np.str_("ali")
print( type(x) )

#:> <class 'numpy.str_'>

12.3.6 datetime64

Unlike python standard datetime library, there is no seperation of date, datetime and time.
There is no time equivalent object
NumPy only has one object: datetime64 object .

12.3.6.1 Constructor

From String
Note that the input string cannot be ISO8601 compliance, meaning any timezone related information at the end of the string (such as Z or +8) will result in error.

np.datetime64('2005-02')

#:> numpy.datetime64('2005-02')

np.datetime64('2005-02-25')

#:> numpy.datetime64('2005-02-25')

np.datetime64('2005-02-25T03:30')

#:> numpy.datetime64('2005-02-25T03:30')

From datetime

np.datetime64( date.today() )

#:> numpy.datetime64('2020-11-20')

np.datetime64( datetime.now() )

#:> numpy.datetime64('2020-11-20T14:28:29.271833')

12.3.6.2 Instance Method

Convert to datetime using astype()

dt64 = np.datetime64("2019-01-31" )
dt64.astype(datetime)

#:> datetime.date(2019, 1, 31)

12.3.7 nan

12.3.7.1 Creating NaN

NaN is NOT A BUILT-IN datatype. It means not a number, a numpy float object type. Can be created using two methods below.

import numpy as np
import pandas as pd
import math

kosong1 = float('NaN')
kosong2 = np.nan

print('Type: ', type(kosong1), '\n',
       'Value: ', kosong1)

#:> Type:  <class 'float'> 
#:>  Value:  nan

print('Type: ', type(kosong2), '\n',
       'Value: ', kosong2)

#:> Type:  <class 'float'> 
#:>  Value:  nan

12.3.7.2 Detecting NaN

Detect nan using various function from panda, numpy and math.

print(pd.isna(kosong1), '\n',
      pd.isna(kosong2), '\n',
      np.isnan(kosong1),'\n',
      math.isnan(kosong2))

#:> True 
#:>  True 
#:>  True 
#:>  True

12.3.7.3 Operation

12.3.7.3.1 Logical Operator

print( True and kosong1,
       kosong1 and True)

#:> nan True

print( True or kosong1,
       False or kosong1)

#:> True nan

12.3.7.3.2 Comparing

Compare nan with anything results in False, including itself.

print(kosong1 > 0, kosong1==0, kosong1<0,
      kosong1 ==1, kosong1==kosong1, kosong1==False, kosong1==True)

#:> False False False False False False False

12.3.7.3.3 Casting

nan is numpy floating value. It is not a zero value, therefore casting to boolean returns True.

bool(kosong1)

#:> True

12.4 Numpy Array

12.4.1 Concept

Structure - NumPy provides an N-dimensional array type, the ndarray - ndarray is homogenous: every item takes up the same size block of memory, and all blocks - For each ndarray, there is a seperate dtype object, which describe ndarray data type
- An item extracted from an array, e.g., by indexing, is represented by a Python object whose type is one of the array scalar types built in NumPy. The array scalars allow easy manipulation of also more complicated arrangements of data. numpy_concept

12.4.2 Constructor

By default, numpy.array autodetect its data types based on most common denominator

12.4.2.1 dType: int, float

Notice example below auto detected as int32 data type

x = np.array( (1,2,3,4,5) )
print(x)

#:> [1 2 3 4 5]

print('Type: ', type(x))

#:> Type:  <class 'numpy.ndarray'>

print('dType:', x.dtype)

#:> dType: int64

Notice example below auto detected as float64 data type

x = np.array( (1,2,3,4.5,5) )
print(x)
# print('Type: ', type(x))
# print('dType:', x.dtype)

#:> [1.  2.  3.  4.5 5. ]

You can specify dtype to specify desired data types.
NumPy will auto convert the data into specifeid types. Observe below that we convert float into integer

x = np.array( (1,2,3,4.5,5), dtype='int' )
print(x)

#:> [1 2 3 4 5]

print('Type: ', type(x))

#:> Type:  <class 'numpy.ndarray'>

print('dType:', x.dtype)

#:> dType: int64

12.4.2.2 dType: datetime64

Specify dtype is necessary to ensure output is datetime type. If not, output is generic object type.

From str

x = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
print(x)

#:> ['2007-07-13' '2006-01-13' '2010-08-13']

print('Type: ', type(x))

#:> Type:  <class 'numpy.ndarray'>

print('dType:', x.dtype)

#:> dType: datetime64[D]

From datetime

x = np.array([datetime(2019,1,12), datetime(2019,1,14),datetime(2019,3,3)], dtype='datetime64')
print(x)

#:> ['2019-01-12T00:00:00.000000' '2019-01-14T00:00:00.000000'
#:>  '2019-03-03T00:00:00.000000']

print('Type: ', type(x))

#:> Type:  <class 'numpy.ndarray'>

print('dType:', x.dtype)

#:> dType: datetime64[us]

print('\nElement Type:',type(x[1]))

#:> 
#:> Element Type: <class 'numpy.datetime64'>

12.4.2.3 2D Array

x = np.array([range(10),np.arange(10)])
x

#:> array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
#:>        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

12.4.3 Dimensions

12.4.3.1 Differentiating Dimensions

1-D array is array of single list
2-D array is array made of list containing lists (each row is a list)
2-D single row array is array with list containing just one list

12.4.3.2 1-D Array

Observe that the shape of the array is (5,). It seems like an array with 5 rows, empty columns !
What it really means is 5 items single dimension.

arr = np.array(range(5))
print (arr)

#:> [0 1 2 3 4]

print (arr.shape)

#:> (5,)

print (arr.ndim)

#:> 1

12.4.3.3 2-D Array

arr = np.array([range(5),range(5,10),range(10,15)])
print (arr)

#:> [[ 0  1  2  3  4]
#:>  [ 5  6  7  8  9]
#:>  [10 11 12 13 14]]

print (arr.shape)

#:> (3, 5)

print (arr.ndim)

#:> 2

12.4.3.4 2-D Array - Single Row

arr = np.array([range(5)])
print (arr)

#:> [[0 1 2 3 4]]

print (arr.shape)

#:> (1, 5)

print (arr.ndim)

#:> 2

12.4.3.5 2-D Array : Single Column

Using array slicing method with newaxis at COLUMN, will turn 1D array into 2D of single column

arr = np.arange(5)[:, np.newaxis]
print (arr)

#:> [[0]
#:>  [1]
#:>  [2]
#:>  [3]
#:>  [4]]

print (arr.shape)

#:> (5, 1)

print (arr.ndim)

#:> 2

Using array slicing method with newaxis at ROW, will turn 1D array into 2D of single row

arr = np.arange(5)[np.newaxis,:]
print (arr)

#:> [[0 1 2 3 4]]

print (arr.shape)

#:> (1, 5)

print (arr.ndim)

#:> 2

12.4.4 Class Method

12.4.4.1 `arange()`

Generate array with a sequence of numbers

np.arange(10)

#:> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

12.4.4.2 `ones()`

np.ones(10)  # One dimension, default is float

#:> array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

np.ones((2,5),'int')  #Two dimensions

#:> array([[1, 1, 1, 1, 1],
#:>        [1, 1, 1, 1, 1]])

12.4.4.3 `zeros()`

np.zeros( 10 )    # One dimension, default is float

#:> array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.zeros((2,5),'int')   # 2 rows, 5 columns of ZERO

#:> array([[0, 0, 0, 0, 0],
#:>        [0, 0, 0, 0, 0]])

12.4.4.4 `where()`

On 1D array numpy.where() returns the items matching the criteria

ar1 = np.array(range(10))
print( ar1 )

#:> [0 1 2 3 4 5 6 7 8 9]

print( np.where(ar1>5) )

#:> (array([6, 7, 8, 9]),)

On 2D array, where() return array of row index and col index for matching elements

ar = np.array([(1,2,3,4,5),(11,12,13,14,15),(21,22,23,24,25)])
print ('Data : \n', ar)

#:> Data : 
#:>  [[ 1  2  3  4  5]
#:>  [11 12 13 14 15]
#:>  [21 22 23 24 25]]

np.where(ar>13)

#:> (array([1, 1, 2, 2, 2, 2, 2]), array([3, 4, 0, 1, 2, 3, 4]))

12.4.4.5 Logical Methods

numpy.logical_or
Perform or operation on two boolean array, generate new resulting boolean arrays

ar = np.arange(10)
print( ar==3 )  # boolean array 1

#:> [False False False  True False False False False False False]

print( ar==6 )  # boolean array 2

#:> [False False False False False False  True False False False]

print( np.logical_or(ar==3,ar==6 ) ) # resulting boolean

#:> [False False False  True False False  True False False False]

numpy.logical_and
Perform and operation on two boolean array, generate new resulting boolean arrays

ar = np.arange(10)
print( ar==3 ) # boolean array 1

#:> [False False False  True False False False False False False]

print( ar==6 ) # boolean array 2

#:> [False False False False False False  True False False False]

print( np.logical_and(ar==3,ar==6 ) )  # resulting boolean

#:> [False False False False False False False False False False]

12.4.5 Instance Method

12.4.5.1 `astype()` conversion

Convert to from datetime64 to datetime

ar1 = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
print( type(ar1) )  ## a numpy array

#:> <class 'numpy.ndarray'>

print( ar1.dtype )  ## dtype is a numpy data type

#:> datetime64[D]

After convert to datetime (non-numpy object, the dtype becomes generic ‘object’.

ar2 = ar1.astype(datetime)
print( type(ar2) )  ## still a numpy array

#:> <class 'numpy.ndarray'>

print( ar2.dtype )  ## dtype becomes generic 'object'

#:> object

12.4.5.2 `reshape()`

reshape ( row numbers, col numbers )

Sample Data

a = np.array([range(5), range(10,15), range(20,25), range(30,35)])
a

#:> array([[ 0,  1,  2,  3,  4],
#:>        [10, 11, 12, 13, 14],
#:>        [20, 21, 22, 23, 24],
#:>        [30, 31, 32, 33, 34]])

Resphepe 1-Dim to 2-Dim

np.arange(12) # 1-D Array

#:> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

np.arange(12).reshape(3,4)  # 2-D Array

#:> array([[ 0,  1,  2,  3],
#:>        [ 4,  5,  6,  7],
#:>        [ 8,  9, 10, 11]])

Respahe 2-Dim to 2-Dim

np.array([range(5), range(10,15)])  # 2-D Array

#:> array([[ 0,  1,  2,  3,  4],
#:>        [10, 11, 12, 13, 14]])

np.array([range(5), range(10,15)]).reshape(5,2) # 2-D Array

#:> array([[ 0,  1],
#:>        [ 2,  3],
#:>        [ 4, 10],
#:>        [11, 12],
#:>        [13, 14]])

Reshape 2-Dimension to 2-Dim (of single row) - Change 2x10 to 1x10
- Observe [[ ]], and the number of dimension is stll 2, don’t be fooled

np.array( [range(0,5), range(5,10)])  # 2-D Array

#:> array([[0, 1, 2, 3, 4],
#:>        [5, 6, 7, 8, 9]])

np.array( [range(0,5), range(5,10)]).reshape(1,10) # 2-D Array

#:> array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Reshape 1-Dim Array to 2-Dim Array (single column)

np.arange(8)

#:> array([0, 1, 2, 3, 4, 5, 6, 7])

np.arange(8).reshape(8,1)

#:> array([[0],
#:>        [1],
#:>        [2],
#:>        [3],
#:>        [4],
#:>        [5],
#:>        [6],
#:>        [7]])

A better method, use newaxis, easier because no need to input row number as parameter

np.arange(8)[:,np.newaxis]

#:> array([[0],
#:>        [1],
#:>        [2],
#:>        [3],
#:>        [4],
#:>        [5],
#:>        [6],
#:>        [7]])

Reshape 1-Dim Array to 2-Dim Array (single row)

np.arange(8)

#:> array([0, 1, 2, 3, 4, 5, 6, 7])

np.arange(8)[np.newaxis,:]

#:> array([[0, 1, 2, 3, 4, 5, 6, 7]])

12.4.6 Element Selection

12.4.6.1 Sample Data

x1 = np.array( (0,1,2,3,4,5,6,7,8))
x2 = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
print(x1)

#:> [0 1 2 3 4 5 6 7 8]

print(x2)

#:> [[ 1  2  3  4  5]
#:>  [11 12 13 14 15]
#:>  [21 22 23 24 25]]

12.4.6.2 1-Dimension

All indexing starts from 0 (not 1)

Choosing Single Element does not return array

print( x1[0]   )  ## first element

#:> 0

print( x1[-1]  )  ## last element

#:> 8

print( x1[3]   )  ## third element from start 3

#:> 3

print( x1[-3]  )  ## third element from end

#:> 6

Selecting multiple elments return ndarray

print( x1[:3]  )  ## first 3 elements

#:> [0 1 2]

print( x1[-3:])   ## last 3 elements

#:> [6 7 8]

print( x1[3:]  )  ## all except first 3 elements

#:> [3 4 5 6 7 8]

print( x1[:-3] )  ## all except last 3 elements

#:> [0 1 2 3 4 5]

print( x1[1:4] )  ## elemnt 1 to 4 (not including 4)

#:> [1 2 3]

12.4.6.3 2-Dimension

Indexing with [ row_positoins, row_positions ], index starts with 0

x[1:3, 1:4] # row 1 to 2 column 1 to 3

#:> array([[1, 2, 3]])

12.4.7 Attributes

12.4.7.1 `dtype`

ndarray contain a property called dtype, whcih tell us the type of underlying items

a = np.array( (1,2,3,4,5), dtype='float' )
a.dtype

#:> dtype('float64')

print(a.dtype)

#:> float64

print( type(a[1]))

#:> <class 'numpy.float64'>

12.4.7.2 `dim`

dim returns the number of dimensions of the NumPy array. Example below shows 2-D array

x = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
x.ndim

#:> 2

12.4.7.3 `shape`

shape return a type of (rows, cols)

x = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
x.shape

#:> (3, 5)

np.identity(5)

#:> array([[1., 0., 0., 0., 0.],
#:>        [0., 1., 0., 0., 0.],
#:>        [0., 0., 1., 0., 0.],
#:>        [0., 0., 0., 1., 0.],
#:>        [0., 0., 0., 0., 1.]])

12.4.8 Operations

12.4.8.1 Arithmetic

Sample Date

ar = np.arange(10)
print( ar )

#:> [0 1 2 3 4 5 6 7 8 9]

*

ar = np.arange(10)
print (ar)

#:> [0 1 2 3 4 5 6 7 8 9]

print (ar*2)

#:> [ 0  2  4  6  8 10 12 14 16 18]

**+ and -**

ar = np.arange(10)
print (ar+2)

#:> [ 2  3  4  5  6  7  8  9 10 11]

print (ar-2)

#:> [-2 -1  0  1  2  3  4  5  6  7]

12.4.8.2 Comparison

Sample Data

ar = np.arange(10)
print( ar )

#:> [0 1 2 3 4 5 6 7 8 9]

==

print( ar==3 )

#:> [False False False  True False False False False False False]

>, >=, <, <=

print( ar>3 )

#:> [False False False False  True  True  True  True  True  True]

print( ar<=3 )

#:> [ True  True  True  True False False False False False False]

12.5 Random Numbers

12.5.1 Uniform Distribution

12.5.1.1 Random Integer (with Replacement)

randint() Return random integers from low (inclusive) to high (exclusive)

np.random.randint( low )                  # generate an integer, i, which         i < low
np.random.randint( low, high )            # generate an integer, i, which  low <= i < high
np.random.randint( low, high, size=1)     # generate an ndarray of integer, single dimension
np.random.randint( low, high, size=(r,c)) # generate an ndarray of integer, two dimensions

np.random.randint( 10 )

#:> 6

np.random.randint( 10, 20 )

#:> 16

np.random.randint( 10, high=20, size=5)   # single dimension

#:> array([15, 18, 14, 11, 13])

np.random.randint( 10, 20, (3,5) )        # two dimensions

#:> array([[18, 19, 14, 17, 11],
#:>        [15, 11, 11, 19, 10],
#:>        [12, 11, 16, 19, 10]])

12.5.1.2 Random Integer (with or without replacement)

numpy.random .choice( a, size, replace=True)
 # sampling from a, 
 #   if a is integer, then it is assumed sampling from arange(a)
 #   if a is an 1-D array, then sampling from this array

np.random.choice(10,5, replace=False) # take 5 samples from 0:19, without replacement

#:> array([6, 0, 4, 1, 2])

np.random.choice( np.arange(10,20), 5, replace=False)

#:> array([11, 13, 10, 14, 15])

12.5.1.3 Random Float

randf() Generate float numbers in between 0.0 and 1.0

np.random.ranf(size=None)

np.random.ranf(4)

#:> array([0.34719156, 0.35147161, 0.59755853, 0.10528617])

uniform() Return random float from low (inclusive) to high (exclusive)

np.random.uniform( low )                  # generate an float, i, which         f < low
np.random.uniform( low, high )            # generate an float, i, which  low <= f < high
np.random.uniform( low, high, size=1)     # generate an array of float, single dimension
np.random.uniform( low, high, size=(r,c)) # generate an array of float, two dimensions

np.random.uniform( 2 )

#:> 1.633967952019189

np.random.uniform( 2,5, size=(4,4) )

#:> array([[2.06434886, 3.66304024, 3.52751507, 4.08096456],
#:>        [4.19814857, 2.95277079, 3.63566489, 4.69076522],
#:>        [2.34947052, 4.17895391, 4.49808652, 3.51828276],
#:>        [3.67805721, 3.22648964, 3.2674474 , 2.8441559 ]])

12.5.2 Normal Distribution

numpy. random.randn (n_items)       # 1-D standard normal (mean=0, stdev=1)
numpy. random.randn (nrows, ncols)  # 2-D standard normal (mean=0, stdev=1)
numpy. random.standard_normal( size=None )                # default to mean = 0, stdev = 1, non-configurable
numpy. random.normal         ( loc=0, scale=1, size=None) # loc = mean, scale = stdev, size = dimension

12.5.2.1 Standard Normal Distribution

Generate random normal numbers with gaussion distribution (mean=0, stdev=1)

One Dimension

np.random.standard_normal(3)

#:> array([-0.29832127, -1.52835978, -1.69015261])

np.random.randn(3)

#:> array([-1.36143442, -1.03616391,  0.30469669])

Two Dimensions

np.random.randn(2,4)

#:> array([[ 0.18301414, -0.81780387,  2.33753414,  1.35667554],
#:>        [ 1.04592906,  0.14818631,  2.3902418 , -2.07301317]])

np.random.standard_normal((2,4))

#:> array([[-0.83193773,  0.67788051, -0.96400219, -0.12383149],
#:>        [ 0.95843138, -1.02865802, -0.95976146, -1.81295684]])

Observe: randn(), standard_normal() and normal() are able to generate standard normal numbers

np.random.seed(15)
print (np.random.randn(5))

#:> [-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]

np.random.seed(15)
print (np.random.normal ( size = 5 )) # stdev and mean not specified, default to standard normal

#:> [-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]

np.random.seed(15)
print (np.random.standard_normal (size=5))

#:> [-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]

12.5.2.2 Normal Distribution (Non-Standard)

np.random.seed(125)
np.random.normal( loc = 12, scale=1.25, size=(3,3))

#:> array([[11.12645382, 12.01327885, 10.81651695],
#:>        [12.41091248, 12.39383072, 11.49647195],
#:>        [ 8.70837035, 12.25246312, 11.49084235]])

12.5.2.3 Linear Spacing

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
# endpoint: If True, stop is the last sample, otherwise it is not included

Include Endpoint
Step = Gap divide by (number of elements minus 1) (2/(10-1))

np.linspace(1,3,10) #default endpont=True

#:> array([1.        , 1.22222222, 1.44444444, 1.66666667, 1.88888889,
#:>        2.11111111, 2.33333333, 2.55555556, 2.77777778, 3.        ])

Does Not Include Endpoint
Step = Gap divide by (number of elements minus 1) (2/(101))

np.linspace(1,3,10,endpoint=False)

#:> array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])

12.6 Sampling (Integer)

random.choice( a, size=None, replace=True, p=None)  # a=integer, return <size> integers < a
random.choice( a, size=None, replace=True, p=None)  # a=array-like, return <size> integers picked from list a

np.random.choice (100, size=10)

#:> array([58,  0, 84, 50, 89, 32, 87, 30, 66, 92])

np.random.choice( [1,3,5,7,9,11,13,15,17,19,21,23], size=10, replace=False)

#:> array([ 5,  1, 23, 17,  3, 13, 15,  9, 21,  7])

12.7 NaN : Missing Numerical Data

You should be aware that NaN is a bit like a data virus?it infects any other object it touches

t = np.array([1, np.nan, 3, 4]) 
t.dtype

#:> dtype('float64')

Regardless of the operation, the result of arithmetic with NaN will be another NaN

1 + np.nan

#:> nan

t.sum(), t.mean(), t.max()

#:> (nan, nan, nan)

np.nansum(t), np.nanmean(t), np.nanmax(t)

#:> (8.0, 2.6666666666666665, 4.0)

11 Plydata (dplyr for Python)

13 pandas