Parallelization and foreign function interfaces
This handout is tied to the following lecture
12. Parallelization and foreign function interfacesCode examples
MPI
Dependencies
- Arch:
pacman -S base-devel openmpi - Ubuntu:
apt install build-essential openmpi-bin openmpi-common libopenmpi-dev
Then install mpi4py with pip install mpi4py. If the
installation process fails, probably you are lacking some basic build dependency, have a look at the
docs to see what is missing.
The typical hello world example - but in parallel!
|
|
A more involved example with actual communication and processing logic
|
|
Threading
A threading/multiprocessing example highlighting the Global interpreter lock
|
|
Ctypes
A minimalist example of ctypes loading compiled C code and numpy arrays
main.py
|
|
main.c
|
|
Makefile
|
|
Python.h
A minimalist example of building a Python module in C/C++
bar.c
|
|
main.py
|
|
Makefile
|
|
Cython
An examples of using Cython to transpile Python to compiled C
py_calc.py
|
|
compiled_py_calc.py
|
|
cy_calc.py
|
|
main.py
|
|
setup.py
|
|
Makefile
|
|
Extensions in other languages
The most common way to build extensions to your Python package in other languages is trough a setuptools Extension. This system can e.g. compile C code as a part of the installation and building process of your package. However there are other methods, such as using meson.
There are also systems for automatically creating interfaces such as pybind11 or the numpy f2py.
Increasing performance
The The Computer Language Benchmarks Game is a quite fun website that micro-benchmarks languages. Remember, this is of course no-where near a real metric for if a language is more performant than another (for that you need a real domain-specific example with actual throughput), but it does show some things about its basic functionality and its compiler/interpreter.
High-performance computing system tools
Languages
So far I have only found one programming language that is specifically designed to be used for parallel or high-performance computing. That language is Chapel. So far it seems very interesting and I’m keeping my eye on it!
Hewlett Packard Enterprise - Chapel: Making parallel computing as easy as Py(thon), from laptops to supercomputers
Also, Chapel does support
interoperability with other
languages trough C, which means you could probably interface to a chapel program trough ctypes
from Python and similarly parallelize your existing C code by calling the functions from a shared
library in chapel.
One example of using Chapel to implement HPC calculations is arkouda, a Python package for executing data processing on a cluster where the cluster server is written in Chapel.
Schedulers
Data systems
Map-reduce systems
More stuff
There are more links to HPC tooling at awesome-high-performance-computing.
Application binary interface
An application binary interface is a interface for accessing in-process machine code, e.g. calling a function from a compiled binary from a process.
The blog post mentioned in the lecture can be found here. Again, if you are not much for reading, there is a reaction video for the post.
ThePrimeTime - C Is Not A Language Anymore
IRF and HPC2N
- IRF is a partner of HPC2N
- Loads of info on: hpc2n.umu.se
- HPC2N contact person at IRF: Mats Holmström
- Anyone can register an account at: supr.snic.se
Homework
Parallelization
Parallelize some piece of software you have written, either during the course or elsewhere. The important part is that this software should theoretically benefit from parallelization (you don’t have to fit your portion coefficients if you know most of the program you can trivially parallelize). You will then show the code before and after so that we can discuss it, as well as show benchmarks of before and after.
Python as glue
Choose some small function from a software you have written, either during the course or elsewhere. The function should be doing mostly “Python things”, not just be one line of numpy. Then implement this function in C or another low level language of your choice and create an extension for your code. You should be able to call the function via your Python code without using something like subprocess, rather it should use ctypes or something like the Python.h and f2py approach. Benchmark the performance before and after. We will review the compiled low level code and the Python code before and after.