• SciPy -
    • Scientific Computing Tools for Python
    • SciPy is a collection of mathematical algorithms and convenience functions built on NumPy . It adds significant power to Python by providing the user with high-level commands and classes for manipulating and visualizing data.

NumPy

Base N-dimensional array package

The NumPy library is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

Creating Arrays

a = np.array([1,2,3])
b = np.array([(1.5,2,3), (4,5,6)], dtype = float)
c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], dtype = float)

Initial Placeholders

np.zeros((3,4))Create an array of zeros
np.ones((2,3,4),dtype=np.int16)Create an array of ones
d = np.arange(10,25,5)Create an array of evenly spaced values (step value)
np.linspace(0,2,9)Create an array of evenly spaced values (number of samples)
e = np.full((2,2),7)Create a constant array
f = np.eye(2)Create a 2X2 identity matrix
np.random.random((2,2))Create an array with random values
np.empty((3,2))Create an empty array
Data TypesInspecting Your Array
np.int64Signed 64-bit integer typesa.shapeArray dimensions
np.float32Standard double-precision floating pointlen(a)Length of array
np.complexComplex numbers represented by 128 floatsb.ndimNumber of array dimensions
np.boolBoolean type storing TRUE and FALSE valuese.sizeNumber of array elements
np.objectPython object typeb.dtypeData type of array elements
np.string_Fixed-length string typeb.dtype.nameName of data type
np.unicode_Fixed-length unicode typeb.astype(int)Convert an array to a different type

Array Mathematics

Arithmetic OperationsAggregate Functions
np.subtract(a,b)Subtractiona.sum()Array-wise sum
np.add(b,a)Additiona.min()Array-wise minimum value
np.divide(a,b)Divisionb.max(axis=0)Maximum value of an array row
np.multiply(a,b)Multiplicationb.cumsum(axis=1)Cumulative sum of the elements
np.exp(b)Exponentiationa.mean()Mean
np.sqrt(b)Square rootb.median()Median
np.sin(a)Print sines of an arraya.corrcoef()Correlation coefficient
np.cos(b)Element-wise cosinenp.std(b)Standard deviation
np.log(a)Element-wise natural logarithm
e.dot(f)Dot product

Pandas

The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.

Pandas Data Structures

Series

A one-dimensional labeled array a capable of holding any data type

s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])

DataFrame

A two-dimensional labeled data structure with columns of potentially different types

data = {'Country': ['Belgium', 'India', 'Brazil'],
        'Capital': ['Brussels', 'New Delhi', 'Brasília'],
        'Population': [11190846, 1303171035, 207847528]}
 
df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])

Getting

s['b']Get one element
df[1:]Get subset of a DataFrame

Selecting

By Position

Select single value by row & column

df.iloc[[0],[0]]
df.iat([0],[0])

By Label

Select single value by row & column labels

df.loc[[0], ['Country']]
df.at([0], ['Country']) 'Belgium'

By Label/Position

Select single row of subset of rows

df.ix[2]

Select a single column of subset of columns

df.ix[:,'Capital']

Select rows and columns

df.ix[1,'Capital']

I/O

CSV/ExcelSQL Query/Database Table
pd.read_csv('file.csv', header=None, nrows=5)from sqlalchemy import create_engine
df.to_csv('myDataFrame.csv')engine = create_engine('sqlite:///:memory:')
pd.read_excel('file.xlsx')pd.read_sql("SELECT * FROM my_table;", engine)
pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')pd.read_sql_table('my_table', engine)
xlsx = pd.ExcelFile('file.xls')pd.read_sql_query("SELECT * FROM my_table;", engine)
df = pd.read_excel(xlsx, 'Sheet1')pd.to_sql(‘myDf’, engine)

Retrieving Series/DataFrame Information

Basic InformationSummary
df.shape(rows,columns)df.sum()Sum of values
df.indexDescribe indexdf.cumsum()Cummulative sum of values
df.columnsDescribe DataFrame columnsdf.min()/df.max()Minimum/maximum values
df.info()Info on DataFramedf.idxmin()/df.idxmax()Minimum/Maximum index value
df.count()Number of non-NA valuesdf.describe()Summary statistics
df.mean()Mean of values
df.median()Median of values

matplotlib

Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.

Plot Anatomy

Workflow

1. Prepare The Data

See Numpy & Pandas

2. Create Plot

Figure
fig = plt.figure()
fig2 = plt.figure(figsize=plt.figaspect(2.0))
Axes

All plotting is done with respect to an Axes. In most cases, a subplot will fit your needs. A subplot is an axes on a grid system.

fig.add_axes()
ax1 = fig.add_subplot(221) # row-col-num
ax3 = fig.add_subplot(212)
fig3, axes = plt.subplots(nrows=2,ncols=2)
fig4, axes2 = plt.subplots(ncols=3)

3. Plotting Routines

  1. 1D Data
  2. 2D Data or Images
  3. Vector Fields
  4. Data Distributions

4. Customize Plot

  • Colors, Color Bars & Color Maps im = ax.imshow(img, cmap='seismic')
  • Markers ax.plot(x,y,marker="o")
  • Linestyles plt.setp(lines,color='r',linewidth=4.0)
  • Text & Annotations ax.text(1, -2.1, 'Example Graph', style='italic')
  • Limits, Legends & Layouts

5. Save Plot

plt.savefig('foo.png')
plt.savefig('foo.png', transparent=True)

6. Show Plot

plt.show()

Other Important Python Libraries

PySparkPython API for Spark
AnsibleConfiguration Management Tool
TensorflowMachine Learning Framework
PyTorchMachine Learning Library
CaffeDeep Learning Framework
KerasHigh-level Neural Networks API
DjangoThe Web framework for perfectionists
Plotly DashFramework for building analytical web applications
NLTKNatural Language Processing Toolkit
ScrapyWeb-Crawling Framework
PillowPython Imaging Library
PygameCross-platform Game Engine