1 Fundamentals

1.1 Library Management

1.1.1 Built-In Libraries

import string
import datetime as dt

1.1.2 Common External Libraries

import numpy as np
import pandas as pd
import datetime as dt

import matplotlib
import matplotlib.pyplot as plt

from plydata import define, query, select, group_by, summarize, arrange, head, rename
import plotnine
from plotnine import *

1.1.2.1 numpy

  • Large multi-dimensional array and matrices
  • High level mathematical funcitons to operate on them
  • Efficient array computation, modeled after matlab
  • Support vectorized array math functions (built on C, hence faster than python for loop and list)

1.1.2.2 scipy

  • Collection of mathematical algorithms and convenience functions built on the numpy extension
  • Built upon numpy

1.1.2.3 Pandas

  • Data manipulation and analysis
  • Offer data structures and operations for manipulating numerical tables and time series
  • Good for analyzing tabular data
  • Use for exploratory data analysis, data pre-processing, statistics and visualization
  • Built upon numpy

1.1.2.4 scikit-learn

  • Machine learning functions
  • Built on top of scipy

1.1.2.5 matplotlib

  • Data Visualization

1.1.3 Package Management

1.1.4 Conda

1.1.4.1 Conda Environment

system("conda info")

1.1.4.2 Package Version

system("conda list") 

1.1.4.3 Package Installation

Conda is recommended distribution.

To install from official conda channel:

conda install <package_name>  # always install latest
conda install <package_name=version_number>

## Example: Install From conda official channel
conda install numpy
conda install scipy
conda install pandas
conda install matpotlib
conda install scikit-learn
conda install seaborn
conda install pip

To install from conda-forge community channel:

conda install -c conda-forge <package_name>
conda install -c conda-forge <package_name=version_number>

## Example: Install From conda community:
conda install -c conda-forge plotnine

1.1.5 PIP

PIP is python open repository (not part of conda). Use pip if the package is not available in conda.

1.1.5.1 Package Version

system("pip list")

1.1.5.2 Package Installation

pip install <package_name>
## Example: pip install plydata

1.2 Everything Is Object

  • Every varibales in python are objects
  • Every variable assginment is reference based, that is, each object value is the reference to memory block of data

In the below exmaple, a, b and c refer to the same memory location:
- Notice when an object assigned to another object, they refer to the same memory location
- When two variable refers to the same value, they refer to the same memory location

a = 123
b = 123  
c = a
print ('Data of a =',  a,
       '\nData of b =',b,
       '\nData of c =',c,
       '\nID of a = ', id(a),
       '\nID of b = ', id(b),
       '\nID of c = ', id(c)
)
#:> Data of a = 123 
#:> Data of b = 123 
#:> Data of c = 123 
#:> ID of a =  139904208751072 
#:> ID of b =  139904208751072 
#:> ID of c =  139904208751072

Changing data value (using assignment) changes the reference

a = 123
b = a
a = 456  # reassignemnt changed a memory reference
         # b memory reference not changed
print ('Data of a =',a,
     '\nData of b =',b,
     '\nID of a = ', id(a),
     '\nID of b = ', id(b)
)
#:> Data of a = 456 
#:> Data of b = 123 
#:> ID of a =  139903753613424 
#:> ID of b =  139904208751072

1.3 Assignment

1.3.1 Multiple Assignment

Assign multiple variable at the same time with same value. Note that all object created using this method refer to the same memory location.

x = y = 'same mem loc'
print ('x = ', x,
     '\ny = ', y,
     '\nid(x) = ', id(x), 
     '\nid(y) = ', id(y)
)
#:> x =  same mem loc 
#:> y =  same mem loc 
#:> id(x) =  139903753599600 
#:> id(y) =  139903753599600

1.3.2 Augmented Assignment

x = 1
y = x + 1
y += 1
print ('y = ', y)
#:> y =  3

1.3.3 Unpacking Assingment

Assign multiple value to multiple variabels at the same time.

x,y = 1,3
print (x,y)
#:> 1 3