12 numpy

  • Best array data manipulation, fast
  • numpy array allows only single data type, unlike list
  • Support matrix operation

12.1 Environment Setup

import pandas as pd
import matplotlib.pyplot as plt
import math
pd.set_option( 'display.notebook_repr_html', False)  # render Series and DataFrame as text, not HTML
pd.set_option( 'display.max_column', 10)    # number of columns
pd.set_option( 'display.max_rows', 10)     # number of rows
pd.set_option( 'display.width', 90)        # number of characters per row

12.2 Module Import

import numpy as np
np.__version__

## other modules
#:> '1.19.1'
from datetime import datetime
from datetime import date
from datetime import time

12.3 Data Types

12.3.1 NumPy Data Types

NumPy supports a much greater variety of numerical types than Python does. This makes numpy much more powerful https://www.numpy.org/devdocs/user/basics.types.html

Integer: np.int8, np.int16, np.int32, np.uint8, np.uint16, np.uint32
Float: np.float32, np.float64

12.3.2 int32/64

np.int is actually python standard int

x = np.int(13)
y = int(13)
print( type(x) )
#:> <class 'int'>
print( type(y) )
#:> <class 'int'>

np.int32/64 are NumPy specific

x = np.int32(13)
y = np.int64(13)
print( type(x) )
#:> <class 'numpy.int32'>
print( type(y) )
#:> <class 'numpy.int64'>

12.3.3 float32/64

x = np.float(13)
y = float(13)
print( type(x) )
#:> <class 'float'>
print( type(y) )
#:> <class 'float'>
x = np.float32(13)
y = np.float64(13)
print( type(x) )
#:> <class 'numpy.float32'>
print( type(y) )
#:> <class 'numpy.float64'>

12.3.4 bool

np.bool is actually python standard bool

x = np.bool(True)
print( type(x) )
#:> <class 'bool'>
print( type(True) )
#:> <class 'bool'>

12.3.5 str

np.str is actually python standard str

x = np.str("ali")
print( type(x) )
#:> <class 'str'>
x = np.str_("ali")
print( type(x) )
#:> <class 'numpy.str_'>

12.3.6 datetime64

Unlike python standard datetime library, there is no seperation of date, datetime and time.
There is no time equivalent object
NumPy only has one object: datetime64 object .

12.3.6.1 Constructor

From String
Note that the input string cannot be ISO8601 compliance, meaning any timezone related information at the end of the string (such as Z or +8) will result in error.

np.datetime64('2005-02')
#:> numpy.datetime64('2005-02')
np.datetime64('2005-02-25')
#:> numpy.datetime64('2005-02-25')
np.datetime64('2005-02-25T03:30')
#:> numpy.datetime64('2005-02-25T03:30')

From datetime

np.datetime64( date.today() )
#:> numpy.datetime64('2020-11-20')
np.datetime64( datetime.now() )
#:> numpy.datetime64('2020-11-20T14:28:29.271833')

12.3.6.2 Instance Method

Convert to datetime using astype()

dt64 = np.datetime64("2019-01-31" )
dt64.astype(datetime)
#:> datetime.date(2019, 1, 31)

12.3.7 nan

12.3.7.1 Creating NaN

NaN is NOT A BUILT-IN datatype. It means not a number, a numpy float object type. Can be created using two methods below.

import numpy as np
import pandas as pd
import math

kosong1 = float('NaN')
kosong2 = np.nan

print('Type: ', type(kosong1), '\n',
       'Value: ', kosong1)
#:> Type:  <class 'float'> 
#:>  Value:  nan
print('Type: ', type(kosong2), '\n',
       'Value: ', kosong2)
#:> Type:  <class 'float'> 
#:>  Value:  nan

12.3.7.2 Detecting NaN

Detect nan using various function from panda, numpy and math.

print(pd.isna(kosong1), '\n',
      pd.isna(kosong2), '\n',
      np.isnan(kosong1),'\n',
      math.isnan(kosong2))
#:> True 
#:>  True 
#:>  True 
#:>  True

12.3.7.3 Operation

12.3.7.3.1 Logical Operator
print( True and kosong1,
       kosong1 and True)
#:> nan True
print( True or kosong1,
       False or kosong1)
#:> True nan
12.3.7.3.2 Comparing

Compare nan with anything results in False, including itself.

print(kosong1 > 0, kosong1==0, kosong1<0,
      kosong1 ==1, kosong1==kosong1, kosong1==False, kosong1==True)
#:> False False False False False False False
12.3.7.3.3 Casting

nan is numpy floating value. It is not a zero value, therefore casting to boolean returns True.

bool(kosong1)
#:> True

12.4 Numpy Array

12.4.1 Concept

Structure - NumPy provides an N-dimensional array type, the ndarray - ndarray is homogenous: every item takes up the same size block of memory, and all blocks - For each ndarray, there is a seperate dtype object, which describe ndarray data type
- An item extracted from an array, e.g., by indexing, is represented by a Python object whose type is one of the array scalar types built in NumPy. The array scalars allow easy manipulation of also more complicated arrangements of data. numpy_concept

12.4.2 Constructor

By default, numpy.array autodetect its data types based on most common denominator

12.4.2.1 dType: int, float

Notice example below auto detected as int32 data type

x = np.array( (1,2,3,4,5) )
print(x)
#:> [1 2 3 4 5]
print('Type: ', type(x))
#:> Type:  <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: int64

Notice example below auto detected as float64 data type

x = np.array( (1,2,3,4.5,5) )
print(x)
# print('Type: ', type(x))
# print('dType:', x.dtype)
#:> [1.  2.  3.  4.5 5. ]

You can specify dtype to specify desired data types.
NumPy will auto convert the data into specifeid types. Observe below that we convert float into integer

x = np.array( (1,2,3,4.5,5), dtype='int' )
print(x)
#:> [1 2 3 4 5]
print('Type: ', type(x))
#:> Type:  <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: int64

12.4.2.2 dType: datetime64

Specify dtype is necessary to ensure output is datetime type. If not, output is generic object type.

From str

x = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
print(x)
#:> ['2007-07-13' '2006-01-13' '2010-08-13']
print('Type: ', type(x))
#:> Type:  <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: datetime64[D]

From datetime

x = np.array([datetime(2019,1,12), datetime(2019,1,14),datetime(2019,3,3)], dtype='datetime64')
print(x)
#:> ['2019-01-12T00:00:00.000000' '2019-01-14T00:00:00.000000'
#:>  '2019-03-03T00:00:00.000000']
print('Type: ', type(x))
#:> Type:  <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: datetime64[us]
print('\nElement Type:',type(x[1]))
#:> 
#:> Element Type: <class 'numpy.datetime64'>

12.4.2.3 2D Array

x = np.array([range(10),np.arange(10)])
x
#:> array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
#:>        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

12.4.3 Dimensions

12.4.3.1 Differentiating Dimensions

1-D array is array of single list
2-D array is array made of list containing lists (each row is a list)
2-D single row array is array with list containing just one list

12.4.3.2 1-D Array

Observe that the shape of the array is (5,). It seems like an array with 5 rows, empty columns !
What it really means is 5 items single dimension.

arr = np.array(range(5))
print (arr)
#:> [0 1 2 3 4]
print (arr.shape)
#:> (5,)
print (arr.ndim)
#:> 1

12.4.3.3 2-D Array

arr = np.array([range(5),range(5,10),range(10,15)])
print (arr)
#:> [[ 0  1  2  3  4]
#:>  [ 5  6  7  8  9]
#:>  [10 11 12 13 14]]
print (arr.shape)
#:> (3, 5)
print (arr.ndim)
#:> 2

12.4.3.4 2-D Array - Single Row

arr = np.array([range(5)])
print (arr)
#:> [[0 1 2 3 4]]
print (arr.shape)
#:> (1, 5)
print (arr.ndim)
#:> 2

12.4.3.5 2-D Array : Single Column

Using array slicing method with newaxis at COLUMN, will turn 1D array into 2D of single column

arr = np.arange(5)[:, np.newaxis]
print (arr)
#:> [[0]
#:>  [1]
#:>  [2]
#:>  [3]
#:>  [4]]
print (arr.shape)
#:> (5, 1)
print (arr.ndim)
#:> 2

Using array slicing method with newaxis at ROW, will turn 1D array into 2D of single row

arr = np.arange(5)[np.newaxis,:]
print (arr)
#:> [[0 1 2 3 4]]
print (arr.shape)
#:> (1, 5)
print (arr.ndim)
#:> 2

12.4.4 Class Method

12.4.4.1 arange()

Generate array with a sequence of numbers

np.arange(10)
#:> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

12.4.4.2 ones()

np.ones(10)  # One dimension, default is float
#:> array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
np.ones((2,5),'int')  #Two dimensions
#:> array([[1, 1, 1, 1, 1],
#:>        [1, 1, 1, 1, 1]])

12.4.4.3 zeros()

np.zeros( 10 )    # One dimension, default is float
#:> array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
np.zeros((2,5),'int')   # 2 rows, 5 columns of ZERO
#:> array([[0, 0, 0, 0, 0],
#:>        [0, 0, 0, 0, 0]])

12.4.4.4 where()

On 1D array numpy.where() returns the items matching the criteria

ar1 = np.array(range(10))
print( ar1 )
#:> [0 1 2 3 4 5 6 7 8 9]
print( np.where(ar1>5) )
#:> (array([6, 7, 8, 9]),)

On 2D array, where() return array of row index and col index for matching elements

ar = np.array([(1,2,3,4,5),(11,12,13,14,15),(21,22,23,24,25)])
print ('Data : \n', ar)
#:> Data : 
#:>  [[ 1  2  3  4  5]
#:>  [11 12 13 14 15]
#:>  [21 22 23 24 25]]
np.where(ar>13)
#:> (array([1, 1, 2, 2, 2, 2, 2]), array([3, 4, 0, 1, 2, 3, 4]))

12.4.4.5 Logical Methods

numpy.logical_or
Perform or operation on two boolean array, generate new resulting boolean arrays

ar = np.arange(10)
print( ar==3 )  # boolean array 1
#:> [False False False  True False False False False False False]
print( ar==6 )  # boolean array 2
#:> [False False False False False False  True False False False]
print( np.logical_or(ar==3,ar==6 ) ) # resulting boolean
#:> [False False False  True False False  True False False False]

numpy.logical_and
Perform and operation on two boolean array, generate new resulting boolean arrays

ar = np.arange(10)
print( ar==3 ) # boolean array 1
#:> [False False False  True False False False False False False]
print( ar==6 ) # boolean array 2
#:> [False False False False False False  True False False False]
print( np.logical_and(ar==3,ar==6 ) )  # resulting boolean
#:> [False False False False False False False False False False]

12.4.5 Instance Method

12.4.5.1 astype() conversion

Convert to from datetime64 to datetime

ar1 = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
print( type(ar1) )  ## a numpy array
#:> <class 'numpy.ndarray'>
print( ar1.dtype )  ## dtype is a numpy data type
#:> datetime64[D]

After convert to datetime (non-numpy object, the dtype becomes generic ‘object’.

ar2 = ar1.astype(datetime)
print( type(ar2) )  ## still a numpy array
#:> <class 'numpy.ndarray'>
print( ar2.dtype )  ## dtype becomes generic 'object'
#:> object

12.4.5.2 reshape()

reshape ( row numbers, col numbers )

Sample Data

a = np.array([range(5), range(10,15), range(20,25), range(30,35)])
a
#:> array([[ 0,  1,  2,  3,  4],
#:>        [10, 11, 12, 13, 14],
#:>        [20, 21, 22, 23, 24],
#:>        [30, 31, 32, 33, 34]])

Resphepe 1-Dim to 2-Dim

np.arange(12) # 1-D Array
#:> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
np.arange(12).reshape(3,4)  # 2-D Array
#:> array([[ 0,  1,  2,  3],
#:>        [ 4,  5,  6,  7],
#:>        [ 8,  9, 10, 11]])

Respahe 2-Dim to 2-Dim

np.array([range(5), range(10,15)])  # 2-D Array
#:> array([[ 0,  1,  2,  3,  4],
#:>        [10, 11, 12, 13, 14]])
np.array([range(5), range(10,15)]).reshape(5,2) # 2-D Array
#:> array([[ 0,  1],
#:>        [ 2,  3],
#:>        [ 4, 10],
#:>        [11, 12],
#:>        [13, 14]])

Reshape 2-Dimension to 2-Dim (of single row) - Change 2x10 to 1x10
- Observe [[ ]], and the number of dimension is stll 2, don’t be fooled

np.array( [range(0,5), range(5,10)])  # 2-D Array
#:> array([[0, 1, 2, 3, 4],
#:>        [5, 6, 7, 8, 9]])
np.array( [range(0,5), range(5,10)]).reshape(1,10) # 2-D Array
#:> array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Reshape 1-Dim Array to 2-Dim Array (single column)

np.arange(8)
#:> array([0, 1, 2, 3, 4, 5, 6, 7])
np.arange(8).reshape(8,1)
#:> array([[0],
#:>        [1],
#:>        [2],
#:>        [3],
#:>        [4],
#:>        [5],
#:>        [6],
#:>        [7]])

A better method, use newaxis, easier because no need to input row number as parameter

np.arange(8)[:,np.newaxis]
#:> array([[0],
#:>        [1],
#:>        [2],
#:>        [3],
#:>        [4],
#:>        [5],
#:>        [6],
#:>        [7]])

Reshape 1-Dim Array to 2-Dim Array (single row)

np.arange(8)
#:> array([0, 1, 2, 3, 4, 5, 6, 7])
np.arange(8)[np.newaxis,:]
#:> array([[0, 1, 2, 3, 4, 5, 6, 7]])

12.4.6 Element Selection

12.4.6.1 Sample Data

x1 = np.array( (0,1,2,3,4,5,6,7,8))
x2 = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
print(x1)
#:> [0 1 2 3 4 5 6 7 8]
print(x2)
#:> [[ 1  2  3  4  5]
#:>  [11 12 13 14 15]
#:>  [21 22 23 24 25]]

12.4.6.2 1-Dimension

All indexing starts from 0 (not 1)

Choosing Single Element does not return array

print( x1[0]   )  ## first element
#:> 0
print( x1[-1]  )  ## last element
#:> 8
print( x1[3]   )  ## third element from start 3
#:> 3
print( x1[-3]  )  ## third element from end
#:> 6

Selecting multiple elments return ndarray

print( x1[:3]  )  ## first 3 elements
#:> [0 1 2]
print( x1[-3:])   ## last 3 elements
#:> [6 7 8]
print( x1[3:]  )  ## all except first 3 elements
#:> [3 4 5 6 7 8]
print( x1[:-3] )  ## all except last 3 elements
#:> [0 1 2 3 4 5]
print( x1[1:4] )  ## elemnt 1 to 4 (not including 4)
#:> [1 2 3]

12.4.6.3 2-Dimension

Indexing with [ row_positoins, row_positions ], index starts with 0

x[1:3, 1:4] # row 1 to 2 column 1 to 3
#:> array([[1, 2, 3]])

12.4.7 Attributes

12.4.7.1 dtype

ndarray contain a property called dtype, whcih tell us the type of underlying items

a = np.array( (1,2,3,4,5), dtype='float' )
a.dtype
#:> dtype('float64')
print(a.dtype)
#:> float64
print( type(a[1]))
#:> <class 'numpy.float64'>

12.4.7.2 dim

dim returns the number of dimensions of the NumPy array. Example below shows 2-D array

x = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
x.ndim  
#:> 2

12.4.7.3 shape

shape return a type of (rows, cols)

x = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
x.shape  
#:> (3, 5)
np.identity(5)
#:> array([[1., 0., 0., 0., 0.],
#:>        [0., 1., 0., 0., 0.],
#:>        [0., 0., 1., 0., 0.],
#:>        [0., 0., 0., 1., 0.],
#:>        [0., 0., 0., 0., 1.]])

12.4.8 Operations

12.4.8.1 Arithmetic

Sample Date

ar = np.arange(10)
print( ar )
#:> [0 1 2 3 4 5 6 7 8 9]

*

ar = np.arange(10)
print (ar)
#:> [0 1 2 3 4 5 6 7 8 9]
print (ar*2)
#:> [ 0  2  4  6  8 10 12 14 16 18]

**+ and -**

ar = np.arange(10)
print (ar+2)
#:> [ 2  3  4  5  6  7  8  9 10 11]
print (ar-2)
#:> [-2 -1  0  1  2  3  4  5  6  7]

12.4.8.2 Comparison

Sample Data

ar = np.arange(10)
print( ar )
#:> [0 1 2 3 4 5 6 7 8 9]

==

print( ar==3 )
#:> [False False False  True False False False False False False]

>, >=, <, <=

print( ar>3 )
#:> [False False False False  True  True  True  True  True  True]
print( ar<=3 )
#:> [ True  True  True  True False False False False False False]

12.5 Random Numbers

12.5.1 Uniform Distribution

12.5.1.1 Random Integer (with Replacement)

randint() Return random integers from low (inclusive) to high (exclusive)

np.random.randint( low )                  # generate an integer, i, which         i < low
np.random.randint( low, high )            # generate an integer, i, which  low <= i < high
np.random.randint( low, high, size=1)     # generate an ndarray of integer, single dimension
np.random.randint( low, high, size=(r,c)) # generate an ndarray of integer, two dimensions 
np.random.randint( 10 )
#:> 6
np.random.randint( 10, 20 )
#:> 16
np.random.randint( 10, high=20, size=5)   # single dimension
#:> array([15, 18, 14, 11, 13])
np.random.randint( 10, 20, (3,5) )        # two dimensions
#:> array([[18, 19, 14, 17, 11],
#:>        [15, 11, 11, 19, 10],
#:>        [12, 11, 16, 19, 10]])

12.5.1.2 Random Integer (with or without replacement)

numpy.random .choice( a, size, replace=True)
 # sampling from a, 
 #   if a is integer, then it is assumed sampling from arange(a)
 #   if a is an 1-D array, then sampling from this array
np.random.choice(10,5, replace=False) # take 5 samples from 0:19, without replacement
#:> array([6, 0, 4, 1, 2])
np.random.choice( np.arange(10,20), 5, replace=False)
#:> array([11, 13, 10, 14, 15])

12.5.1.3 Random Float

randf() Generate float numbers in between 0.0 and 1.0

np.random.ranf(size=None)
np.random.ranf(4)
#:> array([0.34719156, 0.35147161, 0.59755853, 0.10528617])

uniform() Return random float from low (inclusive) to high (exclusive)

np.random.uniform( low )                  # generate an float, i, which         f < low
np.random.uniform( low, high )            # generate an float, i, which  low <= f < high
np.random.uniform( low, high, size=1)     # generate an array of float, single dimension
np.random.uniform( low, high, size=(r,c)) # generate an array of float, two dimensions 
np.random.uniform( 2 )
#:> 1.633967952019189
np.random.uniform( 2,5, size=(4,4) )
#:> array([[2.06434886, 3.66304024, 3.52751507, 4.08096456],
#:>        [4.19814857, 2.95277079, 3.63566489, 4.69076522],
#:>        [2.34947052, 4.17895391, 4.49808652, 3.51828276],
#:>        [3.67805721, 3.22648964, 3.2674474 , 2.8441559 ]])

12.5.2 Normal Distribution

numpy. random.randn (n_items)       # 1-D standard normal (mean=0, stdev=1)
numpy. random.randn (nrows, ncols)  # 2-D standard normal (mean=0, stdev=1)
numpy. random.standard_normal( size=None )                # default to mean = 0, stdev = 1, non-configurable
numpy. random.normal         ( loc=0, scale=1, size=None) # loc = mean, scale = stdev, size = dimension

12.5.2.1 Standard Normal Distribution

Generate random normal numbers with gaussion distribution (mean=0, stdev=1)

One Dimension

np.random.standard_normal(3)
#:> array([-0.29832127, -1.52835978, -1.69015261])
np.random.randn(3)
#:> array([-1.36143442, -1.03616391,  0.30469669])

Two Dimensions

np.random.randn(2,4)
#:> array([[ 0.18301414, -0.81780387,  2.33753414,  1.35667554],
#:>        [ 1.04592906,  0.14818631,  2.3902418 , -2.07301317]])
np.random.standard_normal((2,4))
#:> array([[-0.83193773,  0.67788051, -0.96400219, -0.12383149],
#:>        [ 0.95843138, -1.02865802, -0.95976146, -1.81295684]])

Observe: randn(), standard_normal() and normal() are able to generate standard normal numbers

np.random.seed(15)
print (np.random.randn(5))
#:> [-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]
np.random.seed(15)
print (np.random.normal ( size = 5 )) # stdev and mean not specified, default to standard normal
#:> [-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]
np.random.seed(15)
print (np.random.standard_normal (size=5))
#:> [-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]

12.5.2.2 Normal Distribution (Non-Standard)

np.random.seed(125)
np.random.normal( loc = 12, scale=1.25, size=(3,3))
#:> array([[11.12645382, 12.01327885, 10.81651695],
#:>        [12.41091248, 12.39383072, 11.49647195],
#:>        [ 8.70837035, 12.25246312, 11.49084235]])

12.5.2.3 Linear Spacing

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
# endpoint: If True, stop is the last sample, otherwise it is not included

Include Endpoint
Step = Gap divide by (number of elements minus 1) (2/(10-1))

np.linspace(1,3,10) #default endpont=True
#:> array([1.        , 1.22222222, 1.44444444, 1.66666667, 1.88888889,
#:>        2.11111111, 2.33333333, 2.55555556, 2.77777778, 3.        ])

Does Not Include Endpoint
Step = Gap divide by (number of elements minus 1) (2/(101))

np.linspace(1,3,10,endpoint=False)
#:> array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])

12.6 Sampling (Integer)

random.choice( a, size=None, replace=True, p=None)  # a=integer, return <size> integers < a
random.choice( a, size=None, replace=True, p=None)  # a=array-like, return <size> integers picked from list a
np.random.choice (100, size=10)
#:> array([58,  0, 84, 50, 89, 32, 87, 30, 66, 92])
np.random.choice( [1,3,5,7,9,11,13,15,17,19,21,23], size=10, replace=False)
#:> array([ 5,  1, 23, 17,  3, 13, 15,  9, 21,  7])

12.7 NaN : Missing Numerical Data

  • You should be aware that NaN is a bit like a data virus?it infects any other object it touches
t = np.array([1, np.nan, 3, 4]) 
t.dtype
#:> dtype('float64')

Regardless of the operation, the result of arithmetic with NaN will be another NaN

1 + np.nan
#:> nan
t.sum(), t.mean(), t.max()
#:> (nan, nan, nan)
np.nansum(t), np.nanmean(t), np.nanmax(t)
#:> (8.0, 2.6666666666666665, 4.0)