Numerical Python aka Numpy#

Unless specified differently, these slides are copyright CINECA 2019 and are released under the Attribution--NonCommercial--NoDerivs (CC BY-NC-ND) Creative Commons license, version 3.0. Uses not allowed by the above license need explicit, written permission from the copyright owner. For more information see: http://creativecommons.org/licenses/by-nc-nd/3.0/
Slides were authored by Sergio Orlandini.
NumPy library ecosystem#
“NumPy is the fundamental package for scientific computing with Python”
(source https://numpy.org)
Numpy (https://numpy.org) is the core library for scientific computing and data science in python.
The library name is a contraction of “Numerical Python”.
Numpy library supplies:
a high performance multi-dimensional array data structure more efficient, more convenient and smarter than python lists, with faster access in reading and writing memory
routines for fast operations on arrays performed in compiled code for performance
useful linear algebra, Fourier transform, and random number capabilities
tools for integrating C/C++ and Fortran code
Numpy is faster than pure python:
all operations are implicitly element-wise
avoid inefficiencies of loops in pure Python
all overhead in explicit looping, indexing and interpreting Python code are performed in pre-compiled and vectorized code
combine the benefits and semplicity from coding in Python and the efficiency of compiled code
vectorized code is more readable and concise
def inverse(x):
y = []
for i in range(len(x)):
y.append(1.0 / x[i])
return y
n = 10000000
%time y = inverse(range(1, n))
CPU times: user 3.34 s, sys: 228 ms, total: 3.57 s
Wall time: 3.56 s
import numpy as np
%time y = 1.0 / np.arange(1, n)
CPU times: user 188 ms, sys: 109 ms, total: 297 ms
Wall time: 297 ms
How to import Numpy module#
import numpy as np
Importing NumPy using np as an alias is a standard de-facto that data science community has adopted after discussion on public mailing lists.
This import is not required but it is highly recommended.
Main useful Numpy features are:
Multidimensional Array (
ndarray)homogeneous multidimensional container
Universal Function (
ufunc) that operates on arraysin an element-by-element fashion
vectorized, no need of loops over elements
what is a Numpy N-dimensional array (aka nd-array)#
Multi-dimensinal array container is the principal data structure of the numpy library.
homogeneous container of items of the same type and size
fixed size at creation
dimension are called axes
number of axes is the rank of array
x = np.array([1, 2, 3, 4])
print(x)
[1 2 3 4]
type(x)
numpy.ndarray
Numpy arrays have some attributes that describe their properties
size: number of elements of the array
x.size
4
dtype: data type of each element
x.dtype
dtype('int64')
itemsize: size in bytes of each element
x.itemsize
8
array dimension#
ndim: the number of axes/dimensions is the rank of an array
x.ndim
1
shape: array dimensions. A tuple of integers with the size of the array in each dimension
NB: the length of the shape is the rank
x.shape
(4,)
type(x.shape)
tuple
Bidimensional example#
x = np.array([[1, 2, 3], [4, 5, 6]])
print(x)
[[1 2 3]
[4 5 6]]
x.shape
(2, 3)
x.ndim
2
numpy.array
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
Create an array.
Array types#
list of all numpy types
np.sctypes
{'int': [numpy.int8, numpy.int16, numpy.int32, numpy.int64],
'uint': [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64],
'float': [numpy.float16, numpy.float32, numpy.float64, numpy.float128],
'complex': [numpy.complex64, numpy.complex128, numpy.complex256],
'others': [bool, object, bytes, str, numpy.void]}
x = np.array([[1, 2, 3], [4, 5, 6]], dtype=complex)
x.dtype
dtype('complex128')
x
array([[1.+0.j, 2.+0.j, 3.+0.j],
[4.+0.j, 5.+0.j, 6.+0.j]])
upcasting#
numpy array function upcasts implicitly to the higher precision elements
x = np.array([1, 2, 3.0, 4])
x.dtype
dtype('float64')
x
array([1., 2., 3., 4.])
Shape of an Array#
x = np.array([[1, 2, 3],
[4, 5, 6]])
print(x)
[[1 2 3]
[4 5 6]]
x.shape
(2, 3)
x = np.array([1, 2, 3, 4, 5, 6])
print(x)
[1 2 3 4 5 6]
x.shape
(6,)
Numpy array can be reshaped in-place by assigning a tuple to its shape attribute
x
array([1, 2, 3, 4, 5, 6])
x.shape
(6,)
x.shape = (2, 3)
print(x)
[[1 2 3]
[4 5 6]]
NB: the size of reshaped array must match the size of the original array otherwise an error is raised
numpy.reshape
reshape(a, newshape, order='C')
Gives a new shape to an array without changing its data.
x = np.array([1, 2, 3, 4, 5, 6])
np.reshape(x, (2, 3))
array([[1, 2, 3],
[4, 5, 6]])
np.array([1, 2, 3, 4, 5, 6]).reshape(2, 3)
array([[1, 2, 3],
[4, 5, 6]])
Evenly spaced array intermezzo#
n = int(1e7)
%time x = [i for i in range(n)]
CPU times: user 365 ms, sys: 30.6 ms, total: 395 ms
Wall time: 394 ms
numpy.arange
arange([start,] stop[, step,], dtype=None)
Return evenly spaced values within a given interval.
Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.
%time x = np.arange(n)
CPU times: user 170 ms, sys: 99.9 ms, total: 270 ms
Wall time: 266 ms
Arrays Axes#
x = np.arange(10)
print(x)
[0 1 2 3 4 5 6 7 8 9]

x = np.arange(12).reshape(3, 4)
print(x)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

x = np.arange(24).reshape(2, 3, 4)
print(x)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]

Numpy array print convention:
the last axis (fastest one) is printed from left to right
the second-to-last is printed from top to bottom
the others are printed from top to bottom, with each slice separated by an empty line
x = np.arange(24).reshape(2, 3, 4)
print(x)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
Memory data and Metadata#
A Numpy nd-array is composed by:
pointer to a contiguos memory block
data structure as metadata containing information about data type, shape, strides, …
Python view can be manipulated changing structure metadata without modify data in memory
figure source: enthought
Numpy Array vs Python List#

Universal Functions ufunc#
An universal function is a function that operates on Numpy ndarrays. Each universal function:
performs the core function element-wise on the inputs
is a “vectorized” wrapper for a core function
takes array inputs and produces array outputs
operates in element-by-element fashion
can also operate on input without same dimension. In this case sophisticated operations of broadcasting are defined
For sake of efficiency many of the built-in functions are implemented in compiled C code.
There are more than 60 ufunc in Numpy covering a wide variety of operations:
math operations
trigonometric functions
comparison methods
bit manipulations functions
statistical functions
all operatons are performed element-wise. From operation with scalar …
np.arange(5) + 42
array([42, 43, 44, 45, 46])
… to operations between arrays
even = np.arange(0, 10, 2)
odd = np.arange(1, 10, 2)
print(even, odd)
[0 2 4 6 8] [1 3 5 7 9]
even + odd
array([ 1, 5, 9, 13, 17])
np.add(even, odd)
array([ 1, 5, 9, 13, 17])
Some ufunc have the infix notation defined (+, -, *, /, …).
When the infix notation is used the corresponding ufunc is called internally (add, subtract, multiply, devide, …).
Also comparison operations perform element-wise
even + odd == np.add(even, odd)
array([ True, True, True, True, True])
For comparison operations there are also the array counterpart with naming start with array_ prefix
np.array_equal(even + odd, np.add(even, odd))
True
All arithmetic operates element-wise, even those between matrices, including multiplication
I = np.ones((3, 3))
I
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
I * I
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Tips: use np.dot to perform matrix multiplication
I.dot(I)
array([[3., 3., 3.],
[3., 3., 3.],
[3., 3., 3.]])
Reductions#
np.sum
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Sum of array elements over a given axis.
x = np.arange(3, 10)
x.sum()
42
x = np.arange(6).reshape(2, 3)
print(x)
[[0 1 2]
[3 4 5]]
x.sum() # sums all values
15
x.sum(axis=0) # sums all columns
array([3, 5, 7])
x.sum(axis=1) # sums all rows
array([ 3, 12])
x[0,:].sum(), x[1,:].sum() # sums all rows
(3, 12)
other type of reductions are:
min,maxfor maximum or minimum of an arrayargmin,argmaxfor finding inde of maximum or minimum value in an arraylogical
all,anystatistical
mean,median,std
n = np.array([1729, 73, 42, 6174, 495])
print("min", n.min(), "at", n.argmin())
print("max", n.max(), "at", n.argmax())
min 42 at 2
max 6174 at 3
n.mean()
1702.6
n
array([1729, 73, 42, 6174, 495])
is_even = n%2 == 0
is_even
array([False, False, True, True, False])
print("Is there at least an even number?", np.any(n[is_even]))
Is there at least an even number? True
any(a, axis=None, out=None, keepdims=<no value>)
Test whether any array element along a given axis evaluates to True.
print("Are all even numbers?", np.all(n[is_even]))
Are all even numbers? True
all(a, axis=None, out=None, keepdims=<no value>)
Test whether all array elements along a given axis evaluate to True.
101 ways to create arrays#
There are a lots of methods to create ndarray with different behaviour:
evenly spaced values within a given interval
filled with given values and shape
pre-defined structured
randomly initialized with chosen distribution
Evenly spaced array#
To create evenly spaced array with a given step size use
np.arangeIn the case of non-integer step or if you want to specify the number of samples rather than the tstep size use
np.linspace
n = int(1e6)
%time x1 = [i for i in range(n)]
CPU times: user 42.6 ms, sys: 451 µs, total: 43 ms
Wall time: 41.5 ms
%time x2 = np.arange(n)
CPU times: user 3.27 ms, sys: 0 ns, total: 3.27 ms
Wall time: 1.46 ms
np.array_equal(x1, x2)
True
np.linspace
linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
Return evenly spaced numbers over a specified interval.
Returns `num` evenly spaced samples, calculated over the interval [`start`, `stop`].
The endpoint of the interval can optionally be excluded.
x = np.linspace(0., 10., 10)
x
array([ 0. , 1.11111111, 2.22222222, 3.33333333, 4.44444444,
5.55555556, 6.66666667, 7.77777778, 8.88888889, 10. ])
x = np.linspace(0, 10, 10, endpoint=False)
x
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
t = np.linspace(-np.pi * 2, np.pi * 2, 1000)
x = t + 3 * np.sin(2*t)
y = np.sin(t)
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
Intrinsic array creation#
Numpy has built-in functions for creating arrays from scratch
filled with a value with given shape
np.zeros,np.ones,np.full,np.empty
filled with a value with same shape of another array
np.zeros_like,np.ones_like,np.full_like,np.empty_like
given a structure
np.identity,np.eye
randomly filled following a distribution
given polynomials from
polynomialmodulechebyshev,legendre,laguerre,hermite, …
np.zeros
zeros(shape, dtype=float, order='C')
Return a new array of given shape and type, filled with zeros.
np.zeros(42)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.])
np.zeros(5, dtype=complex)
array([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j])
np.ones
ones(shape, dtype=None, order='C')
Return a new array of given shape and type, filled with ones.
np.ones((3, 4))
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
np.full
full(shape, fill_value, dtype=None, order='C')
Return a new array of given shape and type, filled with `fill_value`.
np.full((2, 3), 42)
array([[42, 42, 42],
[42, 42, 42]])
np.empty
empty(shape, dtype=float, order='C')
Return a new array of given shape and type, without initializing entries.
np.empty((2, 6))
array([[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.]])
np.full_like
full_like(a, fill_value, dtype=None, order='K', subok=True, shape=None)
Return a full array with the same shape and type as a given array.
x = np.arange(12).reshape(3, 4)
np.full_like(x, 42)
array([[42, 42, 42, 42],
[42, 42, 42, 42],
[42, 42, 42, 42]])
# Return the identity array.
x = np.identity(3)
print(x)
print(np.eye(3, k=1))
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[[0. 1. 0.]
[0. 0. 1.]
[0. 0. 0.]]
Hermite polynomials#
$$ \Large |
|
\begin{aligned} |
|
H_{n}(x) &= (-1)^{n} e^{x^{2}} {\frac {d^{n}}{dx^{n}}}e^{-x^{2}} \ |
|
&\ |
|
H_{0}(x)&=1,\ |
|
H_{1}(x)&=2x,\ |
|
H_{2}(x)&=4x^{2}-2,\ |
|
H_{3}(x)&=8x^{3}-12x,\ |
|
H_{4}(x)&=16x^{4}-48x^{2}+12,\ |
|
H_{5}(x)&=32x^{5}-160x^{3}+120x,\ |
|
H_{6}(x)&=64x^{6}-480x^{4}+720x^{2}-120,\ |
|
H_{7}(x)&=128x^{7}-1344x^{5}+3360x^{3}-1680x |
|
\end{aligned} |
|
$$ |
|
np.polynomial.hermite.hermval
hermval(x, c, tensor=True)
Evaluate an Hermite series at points x.
If `c` is of length `n + 1`, this function returns the value:
p(x) = c_0 * H_0(x) + c_1 * H_1(x) + ... + c_n * H_n(x)
x = np.linspace(-3, 3, 100)
H0 = np.polynomial.hermite.hermval(x, 1)
H0[:10]
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
H1 = np.polynomial.hermite.hermval(x, [0, 1])
H01 = np.polynomial.hermite.hermval(x, [1, 1])
np.identity
identity(n, dtype=None)
Return the identity array.
nmax = 5
coeff = np.identity(nmax)
coeff
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
H = np.polynomial.hermite.hermval(x, coeff)
print(type(H), H.shape)
<class 'numpy.ndarray'> (5, 100)
for n in np.arange(1, nmax):
plt.plot(x, H[n], label="$H_{}(x)$".format(n))
plt.xlim([-2, 2])
plt.ylim([-30, 20])
plt.title("Hermite polynomials")
plt.xlabel("$x$")
plt.ylabel("$H_n(x)$")
plt.legend()
plt.show()
Quantum Harmonic Oscillator#
The Schrödinger equation of a particle of mass $m$ in one dimensional harmonic potential $V(x) = \frac{1}{2} k x^2$, where $k$ is the force costant is
$$
\frac{ \hbar }{2 m} \frac{d^2 \psi}{d x^2} + \frac{1}{2} k x^2 \psi = E , \psi $$
Introducing the angular frequency $\omega = \sqrt{k / m} ;$ and changing the coordinate such that $q=(m\omega/\pi\hbar)^{1/4} x,$, the equation becames
$$ -\frac{1}{2} \frac{d^2 \psi}{d q^2} + \frac{1}{2} k q^2 = \frac{E}{\hbar \omega} \psi $$
The solution of Schrödinger equation is the wavefunction $$ \psi_n(q) = N_n ; H_n(q) ; e^{- q^2} $$
where $n$ is the vibrational quantum number, $H_n(q)$ the Hermite polynomials and $N_n = \pi^{1/4}/\sqrt{2^n n!}$ a normalization factor.
The corresponding energy levels are: $$ E_n = \hbar \omega ; (n+\frac{1}{2}) $$
xlim = 12 # domain boundary
x = np.linspace(-xlim, xlim, 1000)
n = 50 # maximum vibrational quantum number
# Harmonic Oscillator Wavefunctions
psi = np.polynomial.hermite.hermval(x, np.identity(n)) * np.exp(-x*x/2.)
# normalize wavefunctions
for i in np.arange(n):
psi[i] *= (1. / np.sqrt(2**i * np.sqrt(np.pi) * np.math.factorial(i)))
# Vibrational Energies
En = np.arange(n) + 0.5
En[0] # Zero-Point Energy
0.5
nmax = 7 # maximum vibrational state to plot
xrng = 5 # domain range
# initialize 2 horizontal plots
fig, axes = plt.subplots(1, 2,figsize=(12, 6) )
off = 0.5
for ax in axes:
# plot potential surface
ax.plot(x, x*x*0.5, linewidth=1.75)
# plot energy levels
for e in En[:nmax]:
ax.axhline(y=e, color='gray', alpha=0.5)
# make-up axis
ax.set_xlabel("$q$")
ax.set_ylim([0, nmax+off])
ax.set_xlim([-xrng, xrng])
ax.set_yticks([])
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
scale = 0.6
for i in np.arange(0, nmax):
# plot wavefunctions
axes[0].plot(x, En[i] + psi[i] * scale,
label="$\psi_{}$".format(i),
color='r', alpha=1./(i*0.7+1))
# plot square of wavefunctions
axes[1].plot(x, En[i] + psi[i]*psi[i] * scale*2.5)
axes[0].set_title("$\psi$ Wavefunction")
axes[1].set_title("$|\,\psi\,|^2$ Probability")
# vibrational state labels
for i in np.arange(nmax):
axes[1].text(s=r'$E_{} = \frac{{{}}}{{2}}\hbar\omega$'.format(i, 2*i+1),
x=-xrng-1, y=En[i], va='center', ha='center')
plt.show()
nmax = 23
xrng = 8
fig, ax = plt.subplots()
ax.plot(x, x*x*0.5, linewidth=1.75)
for e in En[nmax:nmax+1]:
plt.axhline(y=e, color='gray', alpha=0.5)
ax.set_ylim([nmax+off/2, nmax+off*2])
ax.set_xlim([-xrng, xrng])
ax.set_yticks([])
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel("$q$")
scale = 0.6
plt.plot(x, En[nmax] + psi[nmax]*psi[nmax] * scale*2.5)
ax.set_title("$|\,\psi\,|^2$ Probability")
ax.text(s=r'$E_{{{}}} = \frac{{{}}}{{2}}\hbar\omega$'.format(nmax, 2*nmax+1),
x=-xrng-2.5, y=En[nmax], va='center', ha='center')
plt.show()
Random Samples Arrays#
In np.random module there are a lots of functions to create array of a given shape populated with random samples from a given distribution.
Supported distributions are:
Beta, binomial, chi-square, Dirichlet, exponential, F, Gamma, geometric, Gumbel, Hypergeometric, Laplace, …
Samples from uniform distribution
np.rand
rand(d0, d1, ..., dn)
Random values in a given shape.
Create an array of the given shape and populate it with
random samples from a uniform distribution over [0, 1).
x = np.random.rand(3,2)
x
array([[0.52504865, 0.80573668],
[0.39471137, 0.49307503],
[0.87379397, 0.2160302 ]])
NB: if called without arguments a number is returned
np.random.rand()
0.8627107662808488
Samples from normal distribution
np.random.randn
randn(d0, d1, ..., dn)
Return a sample (or samples) from the "standard normal" distribution.
np.random.randn(2, 2)
array([[-0.78954781, 0.95415007],
[-1.29735578, -0.71004843]])
Distributions of Dice Rolls#
randint(low, high=None, size=None, dtype='l')
Return random integers from `low` (inclusive) to `high` (exclusive).
Return random integers from the "discrete uniform" distribution of
the specified dtype in the "half-open" interval [`low`, `high`).
If `high` is None (the default), then results are from [0, `low`).
dice1 = np.random.randint(1, 7, 10000)
dice2 = np.random.randint(1, 7, 10000)
dice1[:20]
array([5, 4, 4, 1, 1, 6, 4, 4, 3, 4, 3, 2, 5, 4, 6, 2, 1, 4, 5, 3])
dices_two = dice1 + dice2
hist, bins = np.histogram(dices_two, density=True)
for h, b in zip(hist, bins):
print("{:2d} {:4.1f}%".format( int(b), h*100 ))
2 2.8%
3 5.5%
4 8.2%
5 11.0%
6 14.1%
7 17.1%
8 13.6%
9 10.9%
10 8.2%
11 8.5%
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(dices_two, 11, density=True)
plt.show()
n = 3 # number of dices
dices = np.random.randint(1, 7, (n, 10000))
dices[0,:10]
array([4, 3, 3, 4, 6, 2, 5, 6, 3, 1])
dices_all = dices.sum(axis=0)
hist, bins = np.histogram(dices_all, bins=n*6-n, density=True)
for h, b in zip(hist, bins):
print("{:2d} {:4.1f}%".format( int(b), h*100 ))
3 0.6%
4 1.5%
5 2.9%
6 4.6%
7 7.1%
8 9.2%
9 11.6%
10 12.1%
11 12.5%
12 12.3%
13 9.8%
14 6.6%
15 4.5%
16 2.6%
17 2.0%
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(dices_all, bins=n*6-n, density=True)
plt.show()
dices_one = np.random.randint(1, 7, 10000)
plt.hist(dices_one, bins=6, density=True, alpha=0.5, label="1 Dice")
plt.hist(dices_two, bins=11, density=True, alpha=0.5, label="2 Dices")
plt.hist(dices_all, bins=n*6-n, density=True, alpha=0.5, label="3 Dices")
plt.legend()
plt.show()
Indexing#
Numpy array can be indexing with bracket notation ([]).
Different type of indexing are supported depending on the type of object used for indexing.
Three types of indexing are allowed:
basic indexing and slicing
standard accesses like Python list/tuple extended in multi-dimensions
advanced indexing also called fancy indexing
accesses with boolean or integr arrays
field access
accessing with string in structured arrays
Standard Pyton Syntax Indexing#
Numpy array follows standard python syntax for indexing.
0-based
accepts negative indices for indexing from the end of the array
out-of-bound accesses are not supported
x = np.arange(10)
print(x)
[0 1 2 3 4 5 6 7 8 9]
x[2]
2
x[-1]
9
out-of-bound accesses are not support and raise an IndexError exception
>>> x[42]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-10-e8949b1d16e1> in <module>()
----> 1 x[42]
IndexError: index 42 is out of bounds for axis 0 with size 10
Array slicing#
Numpy arrays can be sliced.
The extracted array has the same number of dimensions of original one but different size.
Slicing for Numpy array follows the same rules as for python lists and tuples
start:stop:stepnotation inside brackets
x = np.arange(10)
print(x)
[0 1 2 3 4 5 6 7 8 9]
print(x[1:])
[1 2 3 4 5 6 7 8 9]
print(x[:-5])
[0 1 2 3 4]
print(x[1::2])
[1 3 5 7 9]
multi-dimensional indexing#
Unlike Python lists and tuples Numpy arrays support multi-dimensional indexing.
Multi-dimensional index follows the same syntax like one dimensional one and each index is separated by commas.
x = np.arange(16).reshape(4, 4)
print(x)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
x[1,2]
6
x[1][2] # more inefficient
6
multi-dimensional slicing#
x = np.arange(16).reshape(4, 4)
print(x)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
print( x[1,:] ) # select 2nd row
[4 5 6 7]
print( x[1][:] ) # it seems to work
[4 5 6 7]
print( x[:,0] ) # select 1st column
[ 0 4 8 12]
print( x[:][0] ) # no, it doesn't
[0 1 2 3]
Numpy multi-dimensional indexing differs from Python list/tuple indexing
l = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]]
l[1][2]
6
l[1][:]
[4, 5, 6, 7]
l[:][0]
[0, 1, 2, 3]
multidimensional slicing is very useful to extract sub-arrays from an array
x = np.arange(16).reshape(4, 4)
print(x)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
print( x[2:4,1:3] ) # sub-array
[[ 9 10]
[13 14]]
Ellipsis syntax#
The ellipsis syntax (...) maybe used to select any unspecified dimensions
x = np.arange(60).reshape(3, 4, 5)
print(x)
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]]
[[40 41 42 43 44]
[45 46 47 48 49]
[50 51 52 53 54]
[55 56 57 58 59]]]
x[1,:,:] # select 2nd row in 1st axis
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]])
x[1,...] # with ellipsis shortcut
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]])
x
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
x[:,:,1]
array([[ 1, 6, 11, 16],
[21, 26, 31, 36],
[41, 46, 51, 56]])
x[...,1]
array([[ 1, 6, 11, 16],
[21, 26, 31, 36],
[41, 46, 51, 56]])
A View Point of Numpy Array#
In Numpy the result of a simple indexing or a slicing of an array is called a “view” of the original array.
It is worth to note that when we slice an array a new array is returned but this last one is
a view of the original one rather than a copy of the data.
x = np.arange(5)
x
array([0, 1, 2, 3, 4])
y = x[0:2]
y
array([0, 1])
The id of the two arrays are equal?
id(x) == id(y)
False
id(x), id(y)
(140735225169360, 140735225169936)
x is y
False
What do we really mean by a “view”?
y[0] = 42
y
array([42, 1])
what about x?
x
array([42, 1, 2, 3, 4])
The data of both array is shared between them.
Indeed different numpy arrays can share the same data, thus a change made in one array is also visible in another.
Thus by “view” we mean an array that does not own its data, but refers to data of another original array.
This feature is very useful and efficient in the case of very large arrays that can be inspected without useless data copies
Whereas for Python list the slicing returns a copy of the data.
l = [42, 73, 197, 3797, 6174]
l
[42, 73, 197, 3797, 6174]
c = l[0:2]
c
[42, 73]
l[0:2] == c
True
c[0] = 1729
c # two very interesting number
[1729, 73]
l
[42, 73, 197, 3797, 6174]
l[0:2] == c
False
How do you know if the data is original or is it just a copy?#
Check out the base attribute!
x = np.arange(5)
y = x[0:2]
print(x, y)
[0 1 2 3 4] [0 1]
print(y.base)
[0 1 2 3 4]
When an array is a “view” of another array the data it is referring is reported in base attribute
y.base is x
True
print(x.base)
None
The base of an array that owns its memory is None
Advanced Indexing aka Fancy Indexing#
Advanced indexing is performed when the object inside parentheses is a sequence, except for tuple, a tuple with at least a sequence element or a numpy nd-array of type integer or boolean.
There are two type of advanced indexing:
indexing with boolean types
indexing with integer types
Advanced indexing: boolean (aka mask)#
Numpy arrays can be indexed with boolean arrays.
Boolean index follows different rules than standard indexing
Boolean indexing returns a copy of the data not a view
Boolean arrays must be of the same shape of at least the first dimension of indexed arrays
The result of a boolean indexing is a 1D array containing all the true element of the boolean array except in the case of boolean array with fewer dimensions than indexed array
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[x<5]
array([0, 1, 2, 3, 4])
x<5
array([ True, True, True, True, True, False, False, False, False,
False])
x = np.arange(42, 54).reshape(3, 4)
y = np.arange(73, 85).reshape(3, 4)
x[x+y<130]
array([42, 43, 44, 45, 46, 47, 48, 49])
Boolean indexing returns a copy of the data not a view
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = x[x%2 == 0]
y
array([0, 2, 4, 6, 8])
y.base is x
False
print(x.base, y.base)
None None
When a boolean array (B) has lower dimensions than the indexed array (A) the last one is filled automatically to match up the dimension of boolean array.
It is equivalent to use A[B, ...]
The result is an array with one dimension containing the selection of True elements of the boolean array followed by the remaining dimensions of indexed array
x = np.arange(24).reshape(2, 3, 4)
x
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
# select only 2nd row along last axis
boolean_selection = np.array([False, True])
x[boolean_selection]
array([[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
# selecting row along 1st axis
boolean_selection = np.array([ [False, True, True], [False, True, False] ])
x[boolean_selection]
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[16, 17, 18, 19]])
Multidimension boolean indexing#
x = np.arange(24).reshape(2, 3, 4)
x
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
x[x>17]
array([18, 19, 20, 21, 22, 23])
Advanced indexing: integer#
Index arrays can be of integer type
Like boolean indexing in the case of integer arrays a copy of the data is returned not a view
Each value in the indexing array indicates the index of indexed array to take
Indexing array can also be of different shape and dimension than indexed array
x = np.arange(100, 110)
x
array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])
x[np.array([0, 1, 1, 2, 3])]
array([100, 101, 101, 102, 103])
x[np.arange(0, 8, 2)]
array([100, 102, 104, 106])
In the case of integer indexing an array with the same shape of the indexing array is returned filled with the values and types of the indexed array
x = np.arange(0, 10, 2)
x
array([0, 2, 4, 6, 8])
x[np.array([[2, 2, 2], [3, 3, 3]])]
array([[4, 4, 4],
[6, 6, 6]])
Indexing mania#
Various type of indexing can be mixed together without any limit to perversion

source “Scipy Lecture Notes”
Assignment#
A single index, a slice or an indexing array can be used to assign values to an array.
NB: The value to assign must be shape consistent with the subset selected.
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[::2] = 42
x
array([42, 1, 42, 3, 42, 5, 42, 7, 42, 9])
The assignment of an higher types to lower one involves in narrowing type
x = np.arange(10, dtype=np.int32)
x[0] = 3.1415
x
array([3, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
This assignment is a bit weird!#
Assignment is always performed to the original data in the array.
This sounds weird
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[[0, 1, 2, 0, 0]] += 42
x
array([42, 43, 44, 3, 4, 5, 6, 7, 8, 9])
What’s going on here?
A temporary array is indexed from the original one.
The temporary array contains the values [0, 1, 2, 0, 0] and the value 42 is added to the temporary array.
Finally the temporary array is assigned to the original one respecting the order given by the indexing array.
Thus the value corresponding tho the index 0 is assigned 3 times.
This is not equivalent to incrementing 3 times the value at the index 0.
