Numerical Python aka Numpy

Numerical Python aka Numpy#

Unless specified differently, these slides are copyright CINECA 2019 and are released under the Attribution--NonCommercial--NoDerivs (CC BY-NC-ND) Creative Commons license, version 3.0. Uses not allowed by the above license need explicit, written permission from the copyright owner. For more information see: http://creativecommons.org/licenses/by-nc-nd/3.0/

Slides were authored by Sergio Orlandini.

NumPy library ecosystem#

“NumPy is the fundamental package for scientific computing with Python”
(source https://numpy.org)

Numpy (https://numpy.org) is the core library for scientific computing and data science in python.
The library name is a contraction of “Numerical Python”.
Numpy library supplies:

a high performance multi-dimensional array data structure more efficient, more convenient and smarter than python lists, with faster access in reading and writing memory
routines for fast operations on arrays performed in compiled code for performance
useful linear algebra, Fourier transform, and random number capabilities
tools for integrating C/C++ and Fortran code

Numpy is faster than pure python:

all operations are implicitly element-wise
avoid inefficiencies of loops in pure Python
all overhead in explicit looping, indexing and interpreting Python code are performed in pre-compiled and vectorized code
combine the benefits and semplicity from coding in Python and the efficiency of compiled code
vectorized code is more readable and concise

def inverse(x):
    y = []
    for i in range(len(x)):
        y.append(1.0 / x[i])
    return y

n = 10000000

%time y = inverse(range(1, n))

CPU times: user 3.34 s, sys: 228 ms, total: 3.57 s
Wall time: 3.56 s

import numpy as np

%time y = 1.0 / np.arange(1, n)

CPU times: user 188 ms, sys: 109 ms, total: 297 ms
Wall time: 297 ms

How to import Numpy module#

import numpy as np

Importing NumPy using np as an alias is a standard de-facto that data science community has adopted after discussion on public mailing lists.

This import is not required but it is highly recommended.

Main useful Numpy features are:

Multidimensional Array (ndarray)
- homogeneous multidimensional container
Universal Function (ufunc) that operates on arrays
- in an element-by-element fashion
- vectorized, no need of loops over elements

what is a Numpy N-dimensional array (aka nd-array)#

Multi-dimensinal array container is the principal data structure of the numpy library.

homogeneous container of items of the same type and size
fixed size at creation
dimension are called axes
number of axes is the rank of array

x = np.array([1, 2, 3, 4])

print(x)

[1 2 3 4]

type(x)

numpy.ndarray

Numpy arrays have some attributes that describe their properties

size: number of elements of the array

x.size

dtype: data type of each element

x.dtype

dtype('int64')

itemsize: size in bytes of each element

x.itemsize

array dimension#

ndim: the number of axes/dimensions is the rank of an array

x.ndim

shape: array dimensions. A tuple of integers with the size of the array in each dimension
NB: the length of the shape is the rank

x.shape

(4,)

type(x.shape)

tuple

Bidimensional example#

x = np.array([[1, 2, 3], [4, 5, 6]])

print(x)

[[1 2 3]
 [4 5 6]]

x.shape

(2, 3)

x.ndim

numpy.array

array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
    
Create an array.

Array types#

list of all numpy types

np.sctypes

{'int': [numpy.int8, numpy.int16, numpy.int32, numpy.int64],
 'uint': [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64],
 'float': [numpy.float16, numpy.float32, numpy.float64, numpy.float128],
 'complex': [numpy.complex64, numpy.complex128, numpy.complex256],
 'others': [bool, object, bytes, str, numpy.void]}

x = np.array([[1, 2, 3], [4, 5, 6]], dtype=complex)

x.dtype

dtype('complex128')

array([[1.+0.j, 2.+0.j, 3.+0.j],
       [4.+0.j, 5.+0.j, 6.+0.j]])

upcasting#

numpy array function upcasts implicitly to the higher precision elements

x = np.array([1, 2, 3.0, 4])

x.dtype

dtype('float64')

array([1., 2., 3., 4.])

Shape of an Array#

x = np.array([[1, 2, 3],
              [4, 5, 6]])

print(x)

[[1 2 3]
 [4 5 6]]

x.shape

(2, 3)

x = np.array([1, 2, 3, 4, 5, 6])

print(x)

[1 2 3 4 5 6]

x.shape

(6,)

Numpy array can be reshaped in-place by assigning a tuple to its shape attribute

array([1, 2, 3, 4, 5, 6])

x.shape

(6,)

x.shape = (2, 3)

print(x)

[[1 2 3]
 [4 5 6]]

NB: the size of reshaped array must match the size of the original array otherwise an error is raised

numpy.reshape

reshape(a, newshape, order='C')

Gives a new shape to an array without changing its data.

x = np.array([1, 2, 3, 4, 5, 6])

np.reshape(x, (2, 3))

array([[1, 2, 3],
       [4, 5, 6]])

np.array([1, 2, 3, 4, 5, 6]).reshape(2, 3)

array([[1, 2, 3],
       [4, 5, 6]])

Evenly spaced array intermezzo#

n = int(1e7)
%time x = [i for i in range(n)]

CPU times: user 365 ms, sys: 30.6 ms, total: 395 ms
Wall time: 394 ms

numpy.arange

arange([start,] stop[, step,], dtype=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.

%time x = np.arange(n)

CPU times: user 170 ms, sys: 99.9 ms, total: 270 ms
Wall time: 266 ms

Arrays Axes#

x = np.arange(10)

print(x)

[0 1 2 3 4 5 6 7 8 9]

x = np.arange(12).reshape(3, 4)

print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

x = np.arange(24).reshape(2, 3, 4)

print(x)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

Numpy array print convention:

the last axis (fastest one) is printed from left to right
the second-to-last is printed from top to bottom
the others are printed from top to bottom, with each slice separated by an empty line

x = np.arange(24).reshape(2, 3, 4)

print(x)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

Memory data and Metadata#

A Numpy nd-array is composed by:

pointer to a contiguos memory block
data structure as metadata containing information about data type, shape, strides, …

Python view can be manipulated changing structure metadata without modify data in memory

figure source: enthought

Numpy Array vs Python List#

Universal Functions `ufunc`#

An universal function is a function that operates on Numpy ndarrays. Each universal function:

performs the core function element-wise on the inputs
is a “vectorized” wrapper for a core function
takes array inputs and produces array outputs
operates in element-by-element fashion
can also operate on input without same dimension. In this case sophisticated operations of broadcasting are defined

For sake of efficiency many of the built-in functions are implemented in compiled C code.
There are more than 60 ufunc in Numpy covering a wide variety of operations:

math operations
trigonometric functions
comparison methods
bit manipulations functions
statistical functions

all operatons are performed element-wise. From operation with scalar …

np.arange(5) + 42

array([42, 43, 44, 45, 46])

… to operations between arrays

even = np.arange(0, 10, 2)
odd = np.arange(1, 10, 2)

print(even, odd)

[0 2 4 6 8] [1 3 5 7 9]

even + odd

array([ 1,  5,  9, 13, 17])

np.add(even, odd)

array([ 1,  5,  9, 13, 17])

Some ufunc have the infix notation defined (+, -, *, /, …).
When the infix notation is used the corresponding ufunc is called internally (add, subtract, multiply, devide, …).

Also comparison operations perform element-wise

even + odd == np.add(even, odd)

array([ True,  True,  True,  True,  True])

For comparison operations there are also the array counterpart with naming start with array_ prefix

np.array_equal(even + odd, np.add(even, odd))

True

All arithmetic operates element-wise, even those between matrices, including multiplication

I = np.ones((3, 3))
I

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

I * I

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Tips: use np.dot to perform matrix multiplication

I.dot(I)

array([[3., 3., 3.],
       [3., 3., 3.],
       [3., 3., 3.]])

Reductions#

np.sum

sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Sum of array elements over a given axis.

x = np.arange(3, 10)

x.sum()

x = np.arange(6).reshape(2, 3)

print(x)

[[0 1 2]
 [3 4 5]]

x.sum() # sums all values

x.sum(axis=0) # sums all columns

array([3, 5, 7])

x.sum(axis=1) # sums all rows

array([ 3, 12])

x[0,:].sum(), x[1,:].sum() # sums all rows

(3, 12)

other type of reductions are:

min, max for maximum or minimum of an array
argmin, argmax for finding inde of maximum or minimum value in an array
logical all, any
statistical mean, median, std

n = np.array([1729, 73, 42, 6174, 495])

print("min", n.min(), "at", n.argmin())
print("max", n.max(), "at", n.argmax())

min 42 at 2
max 6174 at 3

n.mean()

1702.6

array([1729,   73,   42, 6174,  495])

is_even = n%2 == 0

is_even

array([False, False,  True,  True, False])

print("Is there at least an even number?", np.any(n[is_even]))

Is there at least an even number? True

any(a, axis=None, out=None, keepdims=<no value>)
    Test whether any array element along a given axis evaluates to True.

print("Are all even numbers?", np.all(n[is_even]))

Are all even numbers? True

all(a, axis=None, out=None, keepdims=<no value>)
    Test whether all array elements along a given axis evaluate to True.

101 ways to create arrays#

There are a lots of methods to create ndarray with different behaviour:

evenly spaced values within a given interval
filled with given values and shape
pre-defined structured
randomly initialized with chosen distribution

Evenly spaced array#

To create evenly spaced array with a given step size use np.arange
In the case of non-integer step or if you want to specify the number of samples rather than the tstep size use np.linspace

n = int(1e6)

%time x1 = [i for i in range(n)]

CPU times: user 42.6 ms, sys: 451 µs, total: 43 ms
Wall time: 41.5 ms

%time x2 = np.arange(n)

CPU times: user 3.27 ms, sys: 0 ns, total: 3.27 ms
Wall time: 1.46 ms

np.array_equal(x1, x2)

True

np.linspace

linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
    Return evenly spaced numbers over a specified interval.
    
    Returns `num` evenly spaced samples, calculated over the interval [`start`, `stop`].
    
    The endpoint of the interval can optionally be excluded.

x = np.linspace(0., 10., 10)
x

array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

x = np.linspace(0, 10, 10, endpoint=False)
x

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

t = np.linspace(-np.pi * 2, np.pi * 2, 1000)

x = t + 3 * np.sin(2*t)
y = np.sin(t)

import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()

_images/ace274627e016c75771d3657e31efac75ecda7980065a7a929beeb126aacd783.png

Intrinsic array creation#

Numpy has built-in functions for creating arrays from scratch

filled with a value with given shape
- np.zeros, np.ones, np.full, np.empty
filled with a value with same shape of another array
- np.zeros_like, np.ones_like, np.full_like, np.empty_like
given a structure
- np.identity, np.eye
randomly filled following a distribution
given polynomials from polynomial module
- chebyshev, legendre, laguerre, hermite, …

np.zeros

    zeros(shape, dtype=float, order='C')   
        Return a new array of given shape and type, filled with zeros.

np.zeros(42)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0.])

np.zeros(5, dtype=complex)

array([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j])

np.ones

ones(shape, dtype=None, order='C')
    Return a new array of given shape and type, filled with ones.

np.ones((3, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

np.full

full(shape, fill_value, dtype=None, order='C')
    Return a new array of given shape and type, filled with `fill_value`.

np.full((2, 3), 42)

array([[42, 42, 42],
       [42, 42, 42]])

np.empty

empty(shape, dtype=float, order='C')    
    Return a new array of given shape and type, without initializing entries.

np.empty((2, 6))

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

np.full_like

full_like(a, fill_value, dtype=None, order='K', subok=True, shape=None)
    Return a full array with the same shape and type as a given array.

x = np.arange(12).reshape(3, 4)

np.full_like(x, 42)

array([[42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42]])

# Return the identity array.

x = np.identity(3)

print(x)

print(np.eye(3, k=1))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 0.]]

Hermite polynomials#


$$ \Large
\begin{aligned}
H_{n}(x) &= (-1)^{n} e^{x^{2}} {\frac {d^{n}}{dx^{n}}}e^{-x^{2}} \
&\
H_{0}(x)&=1,\
H_{1}(x)&=2x,\
H_{2}(x)&=4x^{2}-2,\
H_{3}(x)&=8x^{3}-12x,\
H_{4}(x)&=16x^{4}-48x^{2}+12,\
H_{5}(x)&=32x^{5}-160x^{3}+120x,\
H_{6}(x)&=64x^{6}-480x^{4}+720x^{2}-120,\
H_{7}(x)&=128x^{7}-1344x^{5}+3360x^{3}-1680x
\end{aligned}
$$

np.polynomial.hermite.hermval

hermval(x, c, tensor=True)
    Evaluate an Hermite series at points x.
    If `c` is of length `n + 1`, this function returns the value:
    p(x) = c_0 * H_0(x) + c_1 * H_1(x) + ... + c_n * H_n(x)

x = np.linspace(-3, 3, 100)

H0 = np.polynomial.hermite.hermval(x, 1)

H0[:10]

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

H1 = np.polynomial.hermite.hermval(x, [0, 1])

H01 = np.polynomial.hermite.hermval(x, [1, 1])

np.identity

identity(n, dtype=None)
    Return the identity array.

nmax = 5

coeff = np.identity(nmax)

coeff

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

H = np.polynomial.hermite.hermval(x, coeff)

print(type(H), H.shape)

<class 'numpy.ndarray'> (5, 100)

for n in np.arange(1, nmax):
    plt.plot(x, H[n], label="$H_{}(x)$".format(n))

plt.xlim([-2, 2])
plt.ylim([-30, 20])

plt.title("Hermite polynomials")
plt.xlabel("$x$")
plt.ylabel("$H_n(x)$")
plt.legend()
plt.show()

_images/a0032b8a61faddcea78fadf91efc320a9470e5720b56dacc87391089ec94d129.png

Quantum Harmonic Oscillator#

The Schrödinger equation of a particle of mass $m$ in one dimensional harmonic potential $V(x) = \frac{1}{2} k x^2$, where $k$ is the force costant is

$$

\frac{ \hbar }{2 m} \frac{d^2 \psi}{d x^2} + \frac{1}{2} k x^2 \psi = E , \psi $$

Introducing the angular frequency $\omega = \sqrt{k / m} ;$ and changing the coordinate such that $q=(m\omega/\pi\hbar)^{1/4} x,$, the equation becames

$$ -\frac{1}{2} \frac{d^2 \psi}{d q^2} + \frac{1}{2} k q^2 = \frac{E}{\hbar \omega} \psi $$

The solution of Schrödinger equation is the wavefunction $$ \psi_n(q) = N_n ; H_n(q) ; e^{- q^2} $$

where $n$ is the vibrational quantum number, $H_n(q)$ the Hermite polynomials and $N_n = \pi^{1/4}/\sqrt{2^n n!}$ a normalization factor.

The corresponding energy levels are: $$ E_n = \hbar \omega ; (n+\frac{1}{2}) $$

xlim = 12 # domain boundary
x = np.linspace(-xlim, xlim, 1000)

n = 50 # maximum vibrational quantum number

# Harmonic Oscillator Wavefunctions
psi = np.polynomial.hermite.hermval(x, np.identity(n)) * np.exp(-x*x/2.)

# normalize wavefunctions
for i in np.arange(n):
    psi[i] *= (1. / np.sqrt(2**i * np.sqrt(np.pi) * np.math.factorial(i)))

# Vibrational Energies
En = np.arange(n) + 0.5

En[0] # Zero-Point Energy

0.5

nmax = 7 # maximum vibrational state to plot
xrng = 5 # domain range

# initialize 2 horizontal plots
fig, axes = plt.subplots(1, 2,figsize=(12, 6) )

off = 0.5
for ax in axes:
    
    # plot potential surface
    ax.plot(x, x*x*0.5, linewidth=1.75)
    
    # plot energy levels
    for e in En[:nmax]:
        ax.axhline(y=e, color='gray', alpha=0.5)

    # make-up axis
    ax.set_xlabel("$q$")

    ax.set_ylim([0, nmax+off])
    ax.set_xlim([-xrng, xrng])
    ax.set_yticks([])
    
    ax.spines['top'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['right'].set_visible(False)

scale = 0.6
for i in np.arange(0, nmax):
    # plot wavefunctions
    axes[0].plot(x, En[i] + psi[i] * scale,
                 label="$\psi_{}$".format(i),
                 color='r', alpha=1./(i*0.7+1))
    # plot square of wavefunctions
    axes[1].plot(x, En[i] + psi[i]*psi[i] * scale*2.5)

axes[0].set_title("$\psi$ Wavefunction")
axes[1].set_title("$|\,\psi\,|^2$ Probability")

# vibrational state labels
for i in np.arange(nmax):
    axes[1].text(s=r'$E_{} = \frac{{{}}}{{2}}\hbar\omega$'.format(i, 2*i+1),
                 x=-xrng-1, y=En[i], va='center', ha='center')

plt.show()

_images/447adca7bd6a2d60b7fac838ea9de0d18132bd889108e5a25cc78d5ca097ade9.png

nmax = 23
xrng = 8

fig, ax = plt.subplots()

ax.plot(x, x*x*0.5, linewidth=1.75)

for e in En[nmax:nmax+1]:
    plt.axhline(y=e, color='gray', alpha=0.5)

ax.set_ylim([nmax+off/2, nmax+off*2])
ax.set_xlim([-xrng, xrng])
ax.set_yticks([])

ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel("$q$")

scale = 0.6
plt.plot(x, En[nmax] + psi[nmax]*psi[nmax] * scale*2.5)
    
ax.set_title("$|\,\psi\,|^2$ Probability")

ax.text(s=r'$E_{{{}}} = \frac{{{}}}{{2}}\hbar\omega$'.format(nmax, 2*nmax+1),
        x=-xrng-2.5, y=En[nmax], va='center', ha='center')

plt.show()

_images/154d78d29ff875fea5ba81a8521bbe53c98bd9dde5f9b92edf31355fa97731f8.png

Random Samples Arrays#

In np.random module there are a lots of functions to create array of a given shape populated with random samples from a given distribution.
Supported distributions are:
Beta, binomial, chi-square, Dirichlet, exponential, F, Gamma, geometric, Gumbel, Hypergeometric, Laplace, …

Samples from uniform distribution

np.rand

rand(d0, d1, ..., dn)   
    Random values in a given shape.
    Create an array of the given shape and populate it with
    random samples from a uniform distribution over [0, 1).

x = np.random.rand(3,2)
x

array([[0.52504865, 0.80573668],
       [0.39471137, 0.49307503],
       [0.87379397, 0.2160302 ]])

NB: if called without arguments a number is returned

np.random.rand()

0.8627107662808488

Samples from normal distribution

np.random.randn

randn(d0, d1, ..., dn)
    Return a sample (or samples) from the "standard normal" distribution.

np.random.randn(2, 2)

array([[-0.78954781,  0.95415007],
       [-1.29735578, -0.71004843]])

Distributions of Dice Rolls#

randint(low, high=None, size=None, dtype='l')
    
    Return random integers from `low` (inclusive) to `high` (exclusive).
    Return random integers from the "discrete uniform" distribution of
    the specified dtype in the "half-open" interval [`low`, `high`).
    If `high` is None (the default), then results are from [0, `low`).

dice1 = np.random.randint(1, 7, 10000)
dice2 = np.random.randint(1, 7, 10000)

dice1[:20]

array([5, 4, 4, 1, 1, 6, 4, 4, 3, 4, 3, 2, 5, 4, 6, 2, 1, 4, 5, 3])

dices_two = dice1 + dice2

hist, bins = np.histogram(dices_two, density=True)

for h, b in zip(hist, bins):
    print("{:2d}  {:4.1f}%".format( int(b), h*100 ))

 2.8%
 5.5%
 8.2%
11.0%
14.1%
17.1%
13.6%
10.9%
 8.2%
 8.5%

import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(dices_two, 11, density=True)
plt.show()

_images/e602698524037589f43dcf15599f944a901278c36f6bef05466fea41535abd9e.png

n = 3 # number of dices

dices = np.random.randint(1, 7, (n, 10000))

dices[0,:10]

array([4, 3, 3, 4, 6, 2, 5, 6, 3, 1])

dices_all = dices.sum(axis=0)

hist, bins = np.histogram(dices_all, bins=n*6-n, density=True)

for h, b in zip(hist, bins):
    print("{:2d}  {:4.1f}%".format( int(b), h*100 ))

 0.6%
 1.5%
 2.9%
 4.6%
 7.1%
 9.2%
11.6%
12.1%
12.5%
12.3%
 9.8%
 6.6%
 4.5%
 2.6%
 2.0%

import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(dices_all, bins=n*6-n, density=True)
plt.show()

_images/3b863651fce8454577554d30a31fe2f36a32fe27cc5bde0ed1c571b90482da7f.png

dices_one = np.random.randint(1, 7, 10000)

plt.hist(dices_one, bins=6, density=True, alpha=0.5, label="1 Dice")
plt.hist(dices_two, bins=11, density=True, alpha=0.5, label="2 Dices")
plt.hist(dices_all, bins=n*6-n, density=True, alpha=0.5, label="3 Dices")

plt.legend()
plt.show()

_images/f8e6be760d5fe31f1ad6003f143976b0801abf3847ca467599826af91b059c32.png

Indexing#

Numpy array can be indexing with bracket notation ([]).
Different type of indexing are supported depending on the type of object used for indexing.

Three types of indexing are allowed:

basic indexing and slicing
- standard accesses like Python list/tuple extended in multi-dimensions
advanced indexing also called fancy indexing
- accesses with boolean or integr arrays
field access
- accessing with string in structured arrays

Standard Pyton Syntax Indexing#

Numpy array follows standard python syntax for indexing.

0-based
accepts negative indices for indexing from the end of the array
out-of-bound accesses are not supported

x = np.arange(10)

print(x)

[0 1 2 3 4 5 6 7 8 9]

x[2]

x[-1]

out-of-bound accesses are not support and raise an IndexError exception

>>> x[42]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-e8949b1d16e1> in <module>()
----> 1 x[42]

IndexError: index 42 is out of bounds for axis 0 with size 10

Array slicing#

Numpy arrays can be sliced.
The extracted array has the same number of dimensions of original one but different size.
Slicing for Numpy array follows the same rules as for python lists and tuples
start:stop:step notation inside brackets

x = np.arange(10)

print(x)

[0 1 2 3 4 5 6 7 8 9]

print(x[1:])

[1 2 3 4 5 6 7 8 9]

print(x[:-5])

[0 1 2 3 4]

print(x[1::2])

[1 3 5 7 9]

multi-dimensional indexing#

Unlike Python lists and tuples Numpy arrays support multi-dimensional indexing.
Multi-dimensional index follows the same syntax like one dimensional one and each index is separated by commas.

x = np.arange(16).reshape(4, 4)

print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

x[1,2]

x[1][2] # more inefficient

multi-dimensional slicing#

x = np.arange(16).reshape(4, 4)

print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

print( x[1,:] ) # select 2nd row

[4 5 6 7]

print( x[1][:] ) # it seems to work

[4 5 6 7]

print( x[:,0] ) # select 1st column

[ 0  4  8 12]

print( x[:][0] ) # no, it doesn't

[0 1 2 3]

Numpy multi-dimensional indexing differs from Python list/tuple indexing

l = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]]

l[1][2]

l[1][:]

[4, 5, 6, 7]

l[:][0]

[0, 1, 2, 3]

multidimensional slicing is very useful to extract sub-arrays from an array

x = np.arange(16).reshape(4, 4)

print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

print( x[2:4,1:3] ) # sub-array

[[ 9 10]
 [13 14]]

Ellipsis syntax#

The ellipsis syntax (...) maybe used to select any unspecified dimensions

x = np.arange(60).reshape(3, 4, 5)

print(x)

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]]

 [[40 41 42 43 44]
  [45 46 47 48 49]
  [50 51 52 53 54]
  [55 56 57 58 59]]]

x[1,:,:] # select 2nd row in 1st axis

array([[20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39]])

x[1,...] # with ellipsis shortcut

array([[20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39]])

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

x[:,:,1]

array([[ 1,  6, 11, 16],
       [21, 26, 31, 36],
       [41, 46, 51, 56]])

x[...,1]

array([[ 1,  6, 11, 16],
       [21, 26, 31, 36],
       [41, 46, 51, 56]])

A View Point of Numpy Array#

In Numpy the result of a simple indexing or a slicing of an array is called a “view” of the original array.
It is worth to note that when we slice an array a new array is returned but this last one is a view of the original one rather than a copy of the data.

x = np.arange(5)

x

array([0, 1, 2, 3, 4])

y = x[0:2]

y

array([0, 1])

The id of the two arrays are equal?

id(x) == id(y)

False

id(x), id(y)

(140735225169360, 140735225169936)

x is y

False

What do we really mean by a “view”?

y[0] = 42

array([42,  1])

what about x?

array([42,  1,  2,  3,  4])

The data of both array is shared between them.
Indeed different numpy arrays can share the same data, thus a change made in one array is also visible in another.
Thus by “view” we mean an array that does not own its data, but refers to data of another original array.

This feature is very useful and efficient in the case of very large arrays that can be inspected without useless data copies

Whereas for Python list the slicing returns a copy of the data.

l = [42, 73, 197, 3797, 6174]

l

[42, 73, 197, 3797, 6174]

c = l[0:2]

c

[42, 73]

l[0:2] == c

True

c[0] = 1729

c # two very interesting number

[1729, 73]

[42, 73, 197, 3797, 6174]

l[0:2] == c

False

How do you know if the data is original or is it just a copy?#

Check out the base attribute!

x = np.arange(5)

y = x[0:2]

print(x, y)

[0 1 2 3 4] [0 1]

print(y.base)

[0 1 2 3 4]

When an array is a “view” of another array the data it is referring is reported in base attribute

y.base is x

True

print(x.base)

None

The base of an array that owns its memory is None

Advanced Indexing aka Fancy Indexing#

Advanced indexing is performed when the object inside parentheses is a sequence, except for tuple, a tuple with at least a sequence element or a numpy nd-array of type integer or boolean.

There are two type of advanced indexing:

indexing with boolean types
indexing with integer types

Advanced indexing: boolean (aka mask)#

Numpy arrays can be indexed with boolean arrays.

Boolean index follows different rules than standard indexing
Boolean indexing returns a copy of the data not a view
Boolean arrays must be of the same shape of at least the first dimension of indexed arrays
The result of a boolean indexing is a 1D array containing all the true element of the boolean array except in the case of boolean array with fewer dimensions than indexed array

x = np.arange(10)

x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[x<5]

array([0, 1, 2, 3, 4])

x<5

array([ True,  True,  True,  True,  True, False, False, False, False,
       False])

x = np.arange(42, 54).reshape(3, 4)
y = np.arange(73, 85).reshape(3, 4)

x[x+y<130]

array([42, 43, 44, 45, 46, 47, 48, 49])

Boolean indexing returns a copy of the data not a view

x = np.arange(10)

x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = x[x%2 == 0]

y

array([0, 2, 4, 6, 8])

y.base is x

False

print(x.base, y.base)

None None

When a boolean array (B) has lower dimensions than the indexed array (A) the last one is filled automatically to match up the dimension of boolean array.
It is equivalent to use A[B, ...]
The result is an array with one dimension containing the selection of True elements of the boolean array followed by the remaining dimensions of indexed array

x = np.arange(24).reshape(2, 3, 4)

x

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

# select only 2nd row along last axis

boolean_selection = np.array([False, True])

x[boolean_selection]

array([[[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

# selecting row along 1st axis

boolean_selection = np.array([ [False, True, True], [False, True, False] ])

x[boolean_selection]

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [16, 17, 18, 19]])

Multidimension boolean indexing#

x = np.arange(24).reshape(2, 3, 4)

x

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

x[x>17]

array([18, 19, 20, 21, 22, 23])

Advanced indexing: integer#

Index arrays can be of integer type
Like boolean indexing in the case of integer arrays a copy of the data is returned not a view
Each value in the indexing array indicates the index of indexed array to take
Indexing array can also be of different shape and dimension than indexed array

x = np.arange(100, 110)

x

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

x[np.array([0, 1, 1, 2, 3])]

array([100, 101, 101, 102, 103])

x[np.arange(0, 8, 2)]

array([100, 102, 104, 106])

In the case of integer indexing an array with the same shape of the indexing array is returned filled with the values and types of the indexed array

x = np.arange(0, 10, 2)

x

array([0, 2, 4, 6, 8])

x[np.array([[2, 2, 2], [3, 3, 3]])]

array([[4, 4, 4],
       [6, 6, 6]])

Indexing mania#

Various type of indexing can be mixed together without any limit to perversion

source “Scipy Lecture Notes”

Assignment#

A single index, a slice or an indexing array can be used to assign values to an array.
NB: The value to assign must be shape consistent with the subset selected.

x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[::2] = 42
x

array([42,  1, 42,  3, 42,  5, 42,  7, 42,  9])

The assignment of an higher types to lower one involves in narrowing type

x = np.arange(10, dtype=np.int32)

x[0] = 3.1415
x

array([3, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

This assignment is a bit weird!#

Assignment is always performed to the original data in the array.
This sounds weird

x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[[0, 1, 2, 0, 0]] += 42
x

array([42, 43, 44,  3,  4,  5,  6,  7,  8,  9])

What’s going on here?
A temporary array is indexed from the original one. The temporary array contains the values [0, 1, 2, 0, 0] and the value 42 is added to the temporary array. Finally the temporary array is assigned to the original one respecting the order given by the indexing array. Thus the value corresponding tho the index 0 is assigned 3 times. This is not equivalent to incrementing 3 times the value at the index 0.