12 numpy
- Best array data manipulation, fast
- numpy array allows only single data type, unlike list
- Support matrix operation
12.1 Environment Setup
import pandas as pd
import matplotlib.pyplot as plt
import math
'display.notebook_repr_html', False) # render Series and DataFrame as text, not HTML
pd.set_option( 'display.max_column', 10) # number of columns
pd.set_option( 'display.max_rows', 10) # number of rows
pd.set_option( 'display.width', 90) # number of characters per row pd.set_option(
12.2 Module Import
import numpy as np
np.__version__
## other modules
#:> '1.19.1'
from datetime import datetime
from datetime import date
from datetime import time
12.3 Data Types
12.3.1 NumPy Data Types
NumPy supports a much greater variety of numerical types than Python does. This makes numpy much more powerful https://www.numpy.org/devdocs/user/basics.types.html
Integer: np.int8, np.int16, np.int32, np.uint8, np.uint16, np.uint32
Float: np.float32, np.float64
12.3.2 int32/64
np.int
is actually python standard int
= np.int(13)
x = int(13)
y print( type(x) )
#:> <class 'int'>
print( type(y) )
#:> <class 'int'>
np.int32/64
are NumPy specific
= np.int32(13)
x = np.int64(13)
y print( type(x) )
#:> <class 'numpy.int32'>
print( type(y) )
#:> <class 'numpy.int64'>
12.3.3 float32/64
= np.float(13)
x = float(13)
y print( type(x) )
#:> <class 'float'>
print( type(y) )
#:> <class 'float'>
= np.float32(13)
x = np.float64(13)
y print( type(x) )
#:> <class 'numpy.float32'>
print( type(y) )
#:> <class 'numpy.float64'>
12.3.4 bool
np.bool
is actually python standard bool
= np.bool(True)
x print( type(x) )
#:> <class 'bool'>
print( type(True) )
#:> <class 'bool'>
12.3.5 str
np.str
is actually python standard str
= np.str("ali")
x print( type(x) )
#:> <class 'str'>
= np.str_("ali")
x print( type(x) )
#:> <class 'numpy.str_'>
12.3.6 datetime64
Unlike python standard datetime library, there is no seperation of date, datetime and time.
There is no time equivalent object
NumPy only has one object: datetime64 object .
12.3.6.1 Constructor
From String
Note that the input string cannot be ISO8601 compliance, meaning any timezone related information at the end of the string (such as Z or +8) will result in error.
'2005-02') np.datetime64(
#:> numpy.datetime64('2005-02')
'2005-02-25') np.datetime64(
#:> numpy.datetime64('2005-02-25')
'2005-02-25T03:30') np.datetime64(
#:> numpy.datetime64('2005-02-25T03:30')
From datetime
np.datetime64( date.today() )
#:> numpy.datetime64('2020-11-20')
np.datetime64( datetime.now() )
#:> numpy.datetime64('2020-11-20T14:28:29.271833')
12.3.7 nan
12.3.7.1 Creating NaN
NaN is NOT A BUILT-IN datatype. It means not a number, a numpy float object type. Can be created using two methods below.
import numpy as np
import pandas as pd
import math
= float('NaN')
kosong1 = np.nan
kosong2
print('Type: ', type(kosong1), '\n',
'Value: ', kosong1)
#:> Type: <class 'float'>
#:> Value: nan
print('Type: ', type(kosong2), '\n',
'Value: ', kosong2)
#:> Type: <class 'float'>
#:> Value: nan
12.3.7.2 Detecting NaN
Detect nan using various function from panda, numpy and math.
print(pd.isna(kosong1), '\n',
'\n',
pd.isna(kosong2), '\n',
np.isnan(kosong1), math.isnan(kosong2))
#:> True
#:> True
#:> True
#:> True
12.3.7.3 Operation
12.3.7.3.1 Logical Operator
print( True and kosong1,
and True) kosong1
#:> nan True
print( True or kosong1,
False or kosong1)
#:> True nan
12.4 Numpy Array
12.4.1 Concept
Structure
- NumPy provides an N-dimensional array type, the ndarray
- ndarray is homogenous: every item takes up the same size block of memory, and all blocks
- For each ndarray, there is a seperate dtype object, which describe ndarray data type
- An item extracted from an array, e.g., by indexing, is represented by a Python object whose type is one of the array scalar types built in NumPy. The array scalars allow easy manipulation of also more complicated arrangements of data.
12.4.2 Constructor
By default, numpy.array autodetect its data types based on most common denominator
12.4.2.1 dType: int, float
Notice example below auto detected as int32 data type
= np.array( (1,2,3,4,5) )
x print(x)
#:> [1 2 3 4 5]
print('Type: ', type(x))
#:> Type: <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: int64
Notice example below auto detected as float64 data type
= np.array( (1,2,3,4.5,5) )
x print(x)
# print('Type: ', type(x))
# print('dType:', x.dtype)
#:> [1. 2. 3. 4.5 5. ]
You can specify dtype to specify desired data types.
NumPy will auto convert the data into specifeid types. Observe below that we convert float into integer
= np.array( (1,2,3,4.5,5), dtype='int' )
x print(x)
#:> [1 2 3 4 5]
print('Type: ', type(x))
#:> Type: <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: int64
12.4.2.2 dType: datetime64
Specify dtype
is necessary to ensure output is datetime type. If not, output is generic object type.
From str
= np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
x print(x)
#:> ['2007-07-13' '2006-01-13' '2010-08-13']
print('Type: ', type(x))
#:> Type: <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: datetime64[D]
From datetime
= np.array([datetime(2019,1,12), datetime(2019,1,14),datetime(2019,3,3)], dtype='datetime64')
x print(x)
#:> ['2019-01-12T00:00:00.000000' '2019-01-14T00:00:00.000000'
#:> '2019-03-03T00:00:00.000000']
print('Type: ', type(x))
#:> Type: <class 'numpy.ndarray'>
print('dType:', x.dtype)
#:> dType: datetime64[us]
print('\nElement Type:',type(x[1]))
#:>
#:> Element Type: <class 'numpy.datetime64'>
12.4.3 Dimensions
12.4.3.1 Differentiating Dimensions
1-D array is array of single list
2-D array is array made of list containing lists (each row is a list)
2-D single row array is array with list containing just one list
12.4.3.2 1-D Array
Observe that the shape of the array is (5,). It seems like an array with 5 rows, empty columns !
What it really means is 5 items single dimension.
= np.array(range(5))
arr print (arr)
#:> [0 1 2 3 4]
print (arr.shape)
#:> (5,)
print (arr.ndim)
#:> 1
12.4.3.3 2-D Array
= np.array([range(5),range(5,10),range(10,15)])
arr print (arr)
#:> [[ 0 1 2 3 4]
#:> [ 5 6 7 8 9]
#:> [10 11 12 13 14]]
print (arr.shape)
#:> (3, 5)
print (arr.ndim)
#:> 2
12.4.3.4 2-D Array - Single Row
= np.array([range(5)])
arr print (arr)
#:> [[0 1 2 3 4]]
print (arr.shape)
#:> (1, 5)
print (arr.ndim)
#:> 2
12.4.3.5 2-D Array : Single Column
Using array slicing method with newaxis at COLUMN, will turn 1D array into 2D of single column
= np.arange(5)[:, np.newaxis]
arr print (arr)
#:> [[0]
#:> [1]
#:> [2]
#:> [3]
#:> [4]]
print (arr.shape)
#:> (5, 1)
print (arr.ndim)
#:> 2
Using array slicing method with newaxis at ROW, will turn 1D array into 2D of single row
= np.arange(5)[np.newaxis,:]
arr print (arr)
#:> [[0 1 2 3 4]]
print (arr.shape)
#:> (1, 5)
print (arr.ndim)
#:> 2
12.4.4 Class Method
12.4.4.1 arange()
Generate array with a sequence of numbers
10) np.arange(
#:> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
12.4.4.2 ones()
10) # One dimension, default is float np.ones(
#:> array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
2,5),'int') #Two dimensions np.ones((
#:> array([[1, 1, 1, 1, 1],
#:> [1, 1, 1, 1, 1]])
12.4.4.3 zeros()
10 ) # One dimension, default is float np.zeros(
#:> array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
2,5),'int') # 2 rows, 5 columns of ZERO np.zeros((
#:> array([[0, 0, 0, 0, 0],
#:> [0, 0, 0, 0, 0]])
12.4.4.4 where()
On 1D array numpy.where()
returns the items matching the criteria
= np.array(range(10))
ar1 print( ar1 )
#:> [0 1 2 3 4 5 6 7 8 9]
print( np.where(ar1>5) )
#:> (array([6, 7, 8, 9]),)
On 2D array, where()
return array of row index and col index for matching elements
= np.array([(1,2,3,4,5),(11,12,13,14,15),(21,22,23,24,25)])
ar print ('Data : \n', ar)
#:> Data :
#:> [[ 1 2 3 4 5]
#:> [11 12 13 14 15]
#:> [21 22 23 24 25]]
>13) np.where(ar
#:> (array([1, 1, 2, 2, 2, 2, 2]), array([3, 4, 0, 1, 2, 3, 4]))
12.4.4.5 Logical Methods
numpy.logical_or
Perform or operation on two boolean array, generate new resulting boolean arrays
= np.arange(10)
ar print( ar==3 ) # boolean array 1
#:> [False False False True False False False False False False]
print( ar==6 ) # boolean array 2
#:> [False False False False False False True False False False]
print( np.logical_or(ar==3,ar==6 ) ) # resulting boolean
#:> [False False False True False False True False False False]
numpy.logical_and
Perform and operation on two boolean array, generate new resulting boolean arrays
= np.arange(10)
ar print( ar==3 ) # boolean array 1
#:> [False False False True False False False False False False]
print( ar==6 ) # boolean array 2
#:> [False False False False False False True False False False]
print( np.logical_and(ar==3,ar==6 ) ) # resulting boolean
#:> [False False False False False False False False False False]
12.4.5 Instance Method
12.4.5.1 astype()
conversion
Convert to from datetime64 to datetime
= np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
ar1 print( type(ar1) ) ## a numpy array
#:> <class 'numpy.ndarray'>
print( ar1.dtype ) ## dtype is a numpy data type
#:> datetime64[D]
After convert to datetime (non-numpy object, the dtype becomes generic ‘object’.
= ar1.astype(datetime)
ar2 print( type(ar2) ) ## still a numpy array
#:> <class 'numpy.ndarray'>
print( ar2.dtype ) ## dtype becomes generic 'object'
#:> object
12.4.5.2 reshape()
reshape ( row numbers, col numbers )
Sample Data
= np.array([range(5), range(10,15), range(20,25), range(30,35)])
a a
#:> array([[ 0, 1, 2, 3, 4],
#:> [10, 11, 12, 13, 14],
#:> [20, 21, 22, 23, 24],
#:> [30, 31, 32, 33, 34]])
Resphepe 1-Dim to 2-Dim
12) # 1-D Array np.arange(
#:> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
12).reshape(3,4) # 2-D Array np.arange(
#:> array([[ 0, 1, 2, 3],
#:> [ 4, 5, 6, 7],
#:> [ 8, 9, 10, 11]])
Respahe 2-Dim to 2-Dim
range(5), range(10,15)]) # 2-D Array np.array([
#:> array([[ 0, 1, 2, 3, 4],
#:> [10, 11, 12, 13, 14]])
range(5), range(10,15)]).reshape(5,2) # 2-D Array np.array([
#:> array([[ 0, 1],
#:> [ 2, 3],
#:> [ 4, 10],
#:> [11, 12],
#:> [13, 14]])
Reshape 2-Dimension to 2-Dim (of single row)
- Change 2x10 to 1x10
- Observe [[ ]], and the number of dimension is stll 2, don’t be fooled
range(0,5), range(5,10)]) # 2-D Array np.array( [
#:> array([[0, 1, 2, 3, 4],
#:> [5, 6, 7, 8, 9]])
range(0,5), range(5,10)]).reshape(1,10) # 2-D Array np.array( [
#:> array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
Reshape 1-Dim Array to 2-Dim Array (single column)
8) np.arange(
#:> array([0, 1, 2, 3, 4, 5, 6, 7])
8).reshape(8,1) np.arange(
#:> array([[0],
#:> [1],
#:> [2],
#:> [3],
#:> [4],
#:> [5],
#:> [6],
#:> [7]])
A better method, use newaxis, easier because no need to input row number as parameter
8)[:,np.newaxis] np.arange(
#:> array([[0],
#:> [1],
#:> [2],
#:> [3],
#:> [4],
#:> [5],
#:> [6],
#:> [7]])
Reshape 1-Dim Array to 2-Dim Array (single row)
8) np.arange(
#:> array([0, 1, 2, 3, 4, 5, 6, 7])
8)[np.newaxis,:] np.arange(
#:> array([[0, 1, 2, 3, 4, 5, 6, 7]])
12.4.6 Element Selection
12.4.6.1 Sample Data
= np.array( (0,1,2,3,4,5,6,7,8))
x1 = np.array(( (1,2,3,4,5),
x2 11,12,13,14,15),
(21,22,23,24,25)))
(print(x1)
#:> [0 1 2 3 4 5 6 7 8]
print(x2)
#:> [[ 1 2 3 4 5]
#:> [11 12 13 14 15]
#:> [21 22 23 24 25]]
12.4.6.2 1-Dimension
All indexing starts from 0 (not 1)
Choosing Single Element does not return array
print( x1[0] ) ## first element
#:> 0
print( x1[-1] ) ## last element
#:> 8
print( x1[3] ) ## third element from start 3
#:> 3
print( x1[-3] ) ## third element from end
#:> 6
Selecting multiple elments return ndarray
print( x1[:3] ) ## first 3 elements
#:> [0 1 2]
print( x1[-3:]) ## last 3 elements
#:> [6 7 8]
print( x1[3:] ) ## all except first 3 elements
#:> [3 4 5 6 7 8]
print( x1[:-3] ) ## all except last 3 elements
#:> [0 1 2 3 4 5]
print( x1[1:4] ) ## elemnt 1 to 4 (not including 4)
#:> [1 2 3]
12.4.7 Attributes
12.4.7.1 dtype
ndarray contain a property called dtype, whcih tell us the type of underlying items
= np.array( (1,2,3,4,5), dtype='float' )
a a.dtype
#:> dtype('float64')
print(a.dtype)
#:> float64
print( type(a[1]))
#:> <class 'numpy.float64'>
12.4.8 Operations
12.4.8.1 Arithmetic
Sample Date
= np.arange(10)
ar print( ar )
#:> [0 1 2 3 4 5 6 7 8 9]
= np.arange(10)
ar print (ar)
#:> [0 1 2 3 4 5 6 7 8 9]
print (ar*2)
#:> [ 0 2 4 6 8 10 12 14 16 18]
**+ and -**
= np.arange(10)
ar print (ar+2)
#:> [ 2 3 4 5 6 7 8 9 10 11]
print (ar-2)
#:> [-2 -1 0 1 2 3 4 5 6 7]
12.4.8.2 Comparison
Sample Data
= np.arange(10)
ar print( ar )
#:> [0 1 2 3 4 5 6 7 8 9]
print( ar==3 )
#:> [False False False True False False False False False False]
>, >=, <, <=
print( ar>3 )
#:> [False False False False True True True True True True]
print( ar<=3 )
#:> [ True True True True False False False False False False]
12.5 Random Numbers
12.5.1 Uniform Distribution
12.5.1.1 Random Integer (with Replacement)
randint() Return random integers from low (inclusive) to high (exclusive)
np.random.randint( low ) # generate an integer, i, which i < low
np.random.randint( low, high ) # generate an integer, i, which low <= i < high
np.random.randint( low, high, size=1) # generate an ndarray of integer, single dimension
np.random.randint( low, high, size=(r,c)) # generate an ndarray of integer, two dimensions
10 ) np.random.randint(
#:> 6
10, 20 ) np.random.randint(
#:> 16
10, high=20, size=5) # single dimension np.random.randint(
#:> array([15, 18, 14, 11, 13])
10, 20, (3,5) ) # two dimensions np.random.randint(
#:> array([[18, 19, 14, 17, 11],
#:> [15, 11, 11, 19, 10],
#:> [12, 11, 16, 19, 10]])
12.5.1.2 Random Integer (with or without replacement)
numpy.random .choice( a, size, replace=True)
# sampling from a,
# if a is integer, then it is assumed sampling from arange(a)
# if a is an 1-D array, then sampling from this array
10,5, replace=False) # take 5 samples from 0:19, without replacement np.random.choice(
#:> array([6, 0, 4, 1, 2])
10,20), 5, replace=False) np.random.choice( np.arange(
#:> array([11, 13, 10, 14, 15])
12.5.1.3 Random Float
randf() Generate float numbers in between 0.0 and 1.0
np.random.ranf(size=None)
4) np.random.ranf(
#:> array([0.34719156, 0.35147161, 0.59755853, 0.10528617])
uniform() Return random float from low (inclusive) to high (exclusive)
np.random.uniform( low ) # generate an float, i, which f < low
np.random.uniform( low, high ) # generate an float, i, which low <= f < high
np.random.uniform( low, high, size=1) # generate an array of float, single dimension
np.random.uniform( low, high, size=(r,c)) # generate an array of float, two dimensions
2 ) np.random.uniform(
#:> 1.633967952019189
2,5, size=(4,4) ) np.random.uniform(
#:> array([[2.06434886, 3.66304024, 3.52751507, 4.08096456],
#:> [4.19814857, 2.95277079, 3.63566489, 4.69076522],
#:> [2.34947052, 4.17895391, 4.49808652, 3.51828276],
#:> [3.67805721, 3.22648964, 3.2674474 , 2.8441559 ]])
12.5.2 Normal Distribution
numpy. random.randn (n_items) # 1-D standard normal (mean=0, stdev=1)
numpy. random.randn (nrows, ncols) # 2-D standard normal (mean=0, stdev=1)
numpy. random.standard_normal( size=None ) # default to mean = 0, stdev = 1, non-configurable
numpy. random.normal ( loc=0, scale=1, size=None) # loc = mean, scale = stdev, size = dimension
12.5.2.1 Standard Normal Distribution
Generate random normal numbers with gaussion distribution (mean=0, stdev=1)
One Dimension
3) np.random.standard_normal(
#:> array([-0.29832127, -1.52835978, -1.69015261])
3) np.random.randn(
#:> array([-1.36143442, -1.03616391, 0.30469669])
Two Dimensions
2,4) np.random.randn(
#:> array([[ 0.18301414, -0.81780387, 2.33753414, 1.35667554],
#:> [ 1.04592906, 0.14818631, 2.3902418 , -2.07301317]])
2,4)) np.random.standard_normal((
#:> array([[-0.83193773, 0.67788051, -0.96400219, -0.12383149],
#:> [ 0.95843138, -1.02865802, -0.95976146, -1.81295684]])
Observe: randn(), standard_normal() and normal() are able to generate standard normal numbers
15)
np.random.seed(print (np.random.randn(5))
#:> [-0.31232848 0.33928471 -0.15590853 -0.50178967 0.23556889]
15)
np.random.seed(print (np.random.normal ( size = 5 )) # stdev and mean not specified, default to standard normal
#:> [-0.31232848 0.33928471 -0.15590853 -0.50178967 0.23556889]
15)
np.random.seed(print (np.random.standard_normal (size=5))
#:> [-0.31232848 0.33928471 -0.15590853 -0.50178967 0.23556889]
12.5.2.2 Normal Distribution (Non-Standard)
125)
np.random.seed(= 12, scale=1.25, size=(3,3)) np.random.normal( loc
#:> array([[11.12645382, 12.01327885, 10.81651695],
#:> [12.41091248, 12.39383072, 11.49647195],
#:> [ 8.70837035, 12.25246312, 11.49084235]])
12.5.2.3 Linear Spacing
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
# endpoint: If True, stop is the last sample, otherwise it is not included
Include Endpoint
Step = Gap divide by (number of elements minus 1) (2/(10-1))
1,3,10) #default endpont=True np.linspace(
#:> array([1. , 1.22222222, 1.44444444, 1.66666667, 1.88888889,
#:> 2.11111111, 2.33333333, 2.55555556, 2.77777778, 3. ])
Does Not Include Endpoint
Step = Gap divide by (number of elements minus 1) (2/(101))
1,3,10,endpoint=False) np.linspace(
#:> array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])
12.6 Sampling (Integer)
random.choice( a, size=None, replace=True, p=None) # a=integer, return <size> integers < a
random.choice( a, size=None, replace=True, p=None) # a=array-like, return <size> integers picked from list a
100, size=10) np.random.choice (
#:> array([58, 0, 84, 50, 89, 32, 87, 30, 66, 92])
1,3,5,7,9,11,13,15,17,19,21,23], size=10, replace=False) np.random.choice( [
#:> array([ 5, 1, 23, 17, 3, 13, 15, 9, 21, 7])
12.7 NaN : Missing Numerical Data
- You should be aware that NaN is a bit like a data virus?it infects any other object it touches
= np.array([1, np.nan, 3, 4])
t t.dtype
#:> dtype('float64')
Regardless of the operation, the result of arithmetic with NaN will be another NaN
1 + np.nan
#:> nan
sum(), t.mean(), t.max() t.
#:> (nan, nan, nan)
np.nansum(t), np.nanmean(t), np.nanmax(t)
#:> (8.0, 2.6666666666666665, 4.0)