- SciPy -
- Scientific Computing Tools for Python
- SciPy is a collection of mathematical algorithms and convenience functions built on NumPy . It adds significant power to Python by providing the user with high-level commands and classes for manipulating and visualizing data.
NumPy
Base N-dimensional array package
The NumPy library is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.
Creating Arrays
a = np.array([1,2,3])
b = np.array([(1.5,2,3), (4,5,6)], dtype = float)
c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], dtype = float)
Initial Placeholders
np.zeros((3,4)) | Create an array of zeros |
np.ones((2,3,4),dtype=np.int16) | Create an array of ones |
d = np.arange(10,25,5) | Create an array of evenly spaced values (step value) |
np.linspace(0,2,9) | Create an array of evenly spaced values (number of samples) |
e = np.full((2,2),7) | Create a constant array |
f = np.eye(2) | Create a 2X2 identity matrix |
np.random.random((2,2)) | Create an array with random values |
np.empty((3,2)) | Create an empty array |
Data Types | Inspecting Your Array | |||
---|---|---|---|---|
np.int64 | Signed 64-bit integer types | a.shape | Array dimensions | |
np.float32 | Standard double-precision floating point | len(a) | Length of array | |
np.complex | Complex numbers represented by 128 floats | b.ndim | Number of array dimensions | |
np.bool | Boolean type storing TRUE and FALSE values | e.size | Number of array elements | |
np.object | Python object type | b.dtype | Data type of array elements | |
np.string_ | Fixed-length string type | b.dtype.name | Name of data type | |
np.unicode_ | Fixed-length unicode type | b.astype(int) | Convert an array to a different type |
Array Mathematics
Arithmetic Operations | Aggregate Functions | |||
---|---|---|---|---|
np.subtract(a,b) | Subtraction | a.sum() | Array-wise sum | |
np.add(b,a) | Addition | a.min() | Array-wise minimum value | |
np.divide(a,b) | Division | b.max(axis=0) | Maximum value of an array row | |
np.multiply(a,b) | Multiplication | b.cumsum(axis=1) | Cumulative sum of the elements | |
np.exp(b) | Exponentiation | a.mean() | Mean | |
np.sqrt(b) | Square root | b.median() | Median | |
np.sin(a) | Print sines of an array | a.corrcoef() | Correlation coefficient | |
np.cos(b) | Element-wise cosine | np.std(b) | Standard deviation | |
np.log(a) | Element-wise natural logarithm | |||
e.dot(f) | Dot product |
Pandas
The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.
Pandas Data Structures
Series
A one-dimensional labeled array a capable of holding any data type
DataFrame
A two-dimensional labeled data structure with columns of potentially different types
Getting
s['b'] | Get one element |
df[1:] | Get subset of a DataFrame |
Selecting
By Position
Select single value by row & column
By Label
Select single value by row & column labels
By Label/Position
Select single row of subset of rows
Select a single column of subset of columns
Select rows and columns
I/O
CSV/Excel | SQL Query/Database Table |
---|---|
pd.read_csv('file.csv', header=None, nrows=5) | from sqlalchemy import create_engine |
df.to_csv('myDataFrame.csv') | engine = create_engine('sqlite:///:memory:') |
pd.read_excel('file.xlsx') | pd.read_sql("SELECT * FROM my_table;", engine) |
pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') | pd.read_sql_table('my_table', engine) |
xlsx = pd.ExcelFile('file.xls') | pd.read_sql_query("SELECT * FROM my_table;", engine) |
df = pd.read_excel(xlsx, 'Sheet1') | pd.to_sql(‘myDf’, engine) |
Retrieving Series/DataFrame Information
Basic Information | Summary | |||
---|---|---|---|---|
df.shape | (rows,columns) | df.sum() | Sum of values | |
df.index | Describe index | df.cumsum() | Cummulative sum of values | |
df.columns | Describe DataFrame columns | df.min()/df.max() | Minimum/maximum values | |
df.info() | Info on DataFrame | df.idxmin()/df.idxmax() | Minimum/Maximum index value | |
df.count() | Number of non-NA values | df.describe() | Summary statistics | |
df.mean() | Mean of values | |||
df.median() | Median of values |
matplotlib
Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.
Plot Anatomy
Workflow
1. Prepare The Data
See Numpy & Pandas
2. Create Plot
Figure
Axes
All plotting is done with respect to an Axes. In most cases, a subplot will fit your needs. A subplot is an axes on a grid system.
3. Plotting Routines
- 1D Data
- 2D Data or Images
- Vector Fields
- Data Distributions
4. Customize Plot
- Colors, Color Bars & Color Maps
im = ax.imshow(img, cmap='seismic')
- Markers
ax.plot(x,y,marker="o")
- Linestyles
plt.setp(lines,color='r',linewidth=4.0)
- Text & Annotations
ax.text(1, -2.1, 'Example Graph', style='italic')
- Limits, Legends & Layouts
5. Save Plot
6. Show Plot
Other Important Python Libraries
PySpark | Python API for Spark |
Ansible | Configuration Management Tool |
Tensorflow | Machine Learning Framework |
PyTorch | Machine Learning Library |
Caffe | Deep Learning Framework |
Keras | High-level Neural Networks API |
Django | The Web framework for perfectionists |
Plotly Dash | Framework for building analytical web applications |
NLTK | Natural Language Processing Toolkit |
Scrapy | Web-Crawling Framework |
Pillow | Python Imaging Library |
Pygame | Cross-platform Game Engine |