Python for Scientific Programming#

Sergio Orlandini - Susana Bueno - Maria Montagna#

s.orlandini@cineca.it - s.buenominguez@cineca.it - m.montagna@cineca.it

CINECA - Roma#

All the material, slides and hands-on, is available at the repository:

https://gitlab.hpc.cineca.it/cineca-hpyc/python-scientific-2023

In order to download material, open a terminal and run the following command:

git clone https://gitlab.hpc.cineca.it/cineca-hpyc/python-scientific-2023.git

Gentle Starting Pool#

http://etc.ch/xSYs#

Python Introduction#

Let us just start with basic Python prerequisites

  • Python is an interpreted high-level programming language for general-purpose programming.

  • Conceived in the late 1980s by Guido van Rossum in the Netherlands

  • First release in 1991

  • Python 2.0 released in 2000

  • Python 3.0 released in 2008

    • Not completely backward-compatible

    • In 2018 usage statistics finally report 25%/75% for Python2/3 respectively

    • Python 2.7 end-of-life postponed from 2015 to 2020 due to forward-porting difficulties of many portions of code

    • We will mainly focus on Python 3.x during this course

    • In 2018 usage statistics finally report 25%/75% for Python2/3 respectively

    • There is the utility 2to3 to convert code from the old to the new python (not completely effective)

  • Python evolves through so called PEPs (Python Enhancement Proposal)

TIOBE Index#

From https://www.tiobe.com/tiobe-index

The TIOBE programming community index is a measure of popularity of programming languages.
The index is calculated from the number of search engine results for queries containing the name of the language.
https://www.tiobe.com/tiobe-index/programming-languages-definition/

Advantages#

Python is a programming language with many good features:

  • Excellent readability (there is only one way – or at least a pythonic way – to do anything)

  • Very easy to use

  • Easy to learn (looks like pseudo-coding)

  • Excellent portability (Linux, Windows, iOS,…)

    • the course will focus the Linux platform, but moving to other OS is usually not too difficult in Python

    • using Windows, you may use WinPython or better Anaconda https://www.anaconda.com/download/#windows

Applications#

  • Do not call it a “scripting language”! Although it can be used as such, it is much more

    • web-servers

    • scientific computing

    • machine learning

    • databases management

    • Graphical User Interfaces

    • scripting

  • It is an high level language, modern, complete, with which it is possible to realize highly complex software.

  • It is an interpreted language, but it would be more appropriate to call it “dynamic language”.

  • A dynamic language is a high­level language in which many of the controls are executed run-time, while in other languages they are done at compile time.

Python resources#

  • Start with the official Python page… https://www.python.org/

  • …which comes with a detailed documentation… https://docs.python.org/3/index.html

  • …and a extensive tutorial https://docs.python.org/3/tutorial/index.html

  • But there are plenty of resources across the web, books,…

  • And of course StackOverflow https://stackoverflow.com/questions/tagged/python

  • Be careful: study to be pythonic, not only to have a code working, it is the right choice

Python implementation#

  • CPython is the original Python implementation

    • CPython is also the first to implement new features: Python-the-language development uses CPython as the base

    • other implementations may lack some language features or compatibility

    • implemented in C (just an implementation detail)

    • it compiles your Python code into bytecode (transparently) and interprets that bytecode in a evaluation loop

    • bytecode is saved in subfolders of your code named __pycache__ (most probably you will never need to access that code)

  • Alternatives are Jython, PyPy, IronPython

Performances#

  • Python is often considered a slow language. To a large extent this is true: it is slower than Java, for example, and can be much much slower than C or Fortran.

    • This clearly depends on the specific task to execute

    • And on the particular usage of Python: however, the Python philosophy rejects exuberant syntax in favor of simpler readable grammar

  • But the speed is not always the bottleneck.

    • The management of the complexity can be more important than speed.

    • The best performance improvement is the transition from the nonworking to the working state.” –John Ousterhout

    • “Premature optimization is the root of all evil.” –Donald Knuth

    • “You can always optimize it later.” – Unknown

Improving performances#

However, there are several ways to make faster the “critical” parts of a python program.

When speed is important, a Python programmer can:

  • use another Python interpreter instead of CPython, such as PyPy or numba, just-in-time compilers

  • move time-critical functions to extension modules written in languages such as C or Fortran

  • use the state-of-the-art Python packages which are implemented using low-level languages (numpy, scipy): keep your code pythonic but fast

  • use Cython package which translates a Python script into C and makes direct C-level API calls into the Python interpreter

LU decomposition performance comparison#

Elaborated from https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en

| N	| pure python |  numba     |   numpy   |   cython  |   scipy  | fortran    |
|------| ------------|------------|-----------|-----------|----------|------------|
|5     | 0.000032    | 0.000002   | 0.000057  | 0.000002  | 0.000026 | 0.0000001  |
|10    | 0.000185    | 0.000001   | 0.000192  | 0.000003  | 0.000036 | 0.0000003  |
|30    | 0.004759    | 0.000018   | 0.001646  | 0.000016  | 0.000063 | 0.0000090  |
|100   | 0.178138    | 0.000809   | 0.023402  | 0.000700  | 0.000533 | 0.0002899  |
|200   | 1.437647    | 0.005188   | 0.091983  | 0.004781  | 0.001282 | 0.0026900  |
|300   | 4.724848    | 0.015308   | 0.209932  | 0.013939  | 0.003942 | 0.0141052  |

A bit of history of Jupyter notebook#

  • IPython interpreter: Born in 2001 as a work of a student (Fernando Perez).
    Based on features he liked in Mathematica and trying to create a system for everyday scientific computing.

  • IPython notebook is an HTML-based notebook environment for Python

    • Based on the IPython shell

    • Provides a web cell-based interactive environment powered with Javascript

    • Comments and notes with HTML and markdown formats

    • Integrates embedded plots

  • Jupyter project

    • In 2014, Fernado Perez announced a spin-off project from IPython called Project Jupyter (JUlia PYThon and R).

    • IPython continues to exist as a Python shell and a kernel for Jupyter, while the notebook and other language-agnostic parts of IPython have been moved under the Jupyter name.

    • Jupyter added support for Julia, R, Haskell and Ruby.

## Jupyter notebooks

Jupyter notebooks are based on jupyter kernels
Currently many kernels are avialable: Python, Julia, R, C++, Fortran, …)

A notebook is composed by cells:

  • code: cells containing Python code which can be executed

  • markdown: cells containing text which can be formatted using easy Markdown language (a superset of HTML) https://guides.github.com/features/mastering-markdown/

  • raw: cells with unformatted text (rarely used)

You can start it from a terminal by running

jupyter notebook

A browser is automatically opened with the a browsable view on your home

## Jupyter notebooks

Notebooks are cool:

  • markdown and notes (consider to learn markdown language)

  • code is executing it inside the cell itself

  • can be downloaded as .ipynb but also as python or html file

  • support for slideshow, enhanced through the RISE package

  • in a code cells, in addition to Python code you can execute

    • shell commands (e.g. ls), prefixed by !

    • so called Jupyter magic commands, prefixed by % (line maginc) or %% (cell magic)

    • plots using inline Matplotlib library

Notebooks are great:

  • for didactic purpose

  • for prototyping

For realistc projects you may need something different, it depends

Scientific Python#

For write a scientific application in Python you nedd different skills:

  1. Efficient coding

    • best practise, typical errors, patterns, timing and profiling

  2. Scientific stack

    • numpy, scipy, pandas, matplotlib, …

    • many many more

  3. Parallelism

    • threading, multiprocessing, mpi4py, joblib, …

    • many more

  4. Interoperability

    • Cython, C&Fortran, …

  5. JIT (Just in Time compilers)

    • numba, PyPy, …

We deal with point 2, 3, 5, …

Python Scientific stack#

This course will show examples from (a subset) of popular scientific packages