4. Using numba to release the GIL¶

4.1. Timing python code¶

One easy way to tell whether you are utilizing multiple cores is to track the wall clock time measured by time.perf_counter against the total cpu time used by all threads meausred with time.process_time

I’ll organize these two timers using the contexttimer module.

To install, in a shell window type:

 pip install contexttimer

4.1.1. Define a function that does a lot of computation¶

import contexttimer
import time
import math
from numba import jit
from joblib import Parallel
import logging

!conda install numba

import contexttimer
import time
import math
import numba

def wait_loop(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out

4.1.2. now time it with pure python¶

nloops=200
with contexttimer.Timer(time.perf_counter) as pure_wall:
    with contexttimer.Timer(time.process_time) as pure_cpu:
        result=wait_loop(nloops)
print(f'pure python wall time {pure_wall.elapsed} and cpu time {pure_cpu.elapsed}')

4.2. Now try this with numba¶

Numba is a just in time compiler that can turn a subset of python into machine code using the llvm compiler.

Reference: Numba documentation

4.3. Make two identical functions: one that releases and one that holds the GIL¶

@jit('float64(int64)', nopython=True, nogil=True)
def wait_loop_nogil(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out

@jit('float64(int64)', nopython=True, nogil=False)
def wait_loop_withgil(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out

4.4. now time wait_loop_withgil¶

nloops=500
with contexttimer.Timer(time.perf_counter) as numba_wall:
    with contexttimer.Timer(time.process_time) as numba_cpu:
        result=wait_loop_withgil(nloops)
print(f'numba wall time {numba_wall.elapsed} and cpu time {numba_cpu.elapsed}')
print(f"numba speed-up factor {(pure_wall.elapsed - numba_wall.elapsed)/numba_wall.elapsed}")

4.5. not bad, but we’re only using one core¶

3. Using dask and zarr for multithreaded input/output 5. A short dask example