4. Using numba to release the GIL

4.1. Timing python code

One easy way to tell whether you are utilizing multiple cores is to track the wall clock time measured by time.perf_counter against the total cpu time used by all threads meausred with time.process_time

I’ll organize these two timers using the contexttimer module.

To install, in a shell window type:

 pip install contexttimer

4.1.1. Define a function that does a lot of computation

import contexttimer
import time
import math
from numba import jit
from joblib import Parallel
import logging
!conda install numba
import contexttimer
import time
import math
import numba
def wait_loop(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out

4.1.2. now time it with pure python

nloops=200
with contexttimer.Timer(time.perf_counter) as pure_wall:
    with contexttimer.Timer(time.process_time) as pure_cpu:
        result=wait_loop(nloops)
print(f'pure python wall time {pure_wall.elapsed} and cpu time {pure_cpu.elapsed}')

4.2. Now try this with numba

Numba is a just in time compiler that can turn a subset of python into machine code using the llvm compiler.

Reference: Numba documentation

4.3. Make two identical functions: one that releases and one that holds the GIL

@jit('float64(int64)', nopython=True, nogil=True)
def wait_loop_nogil(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out
@jit('float64(int64)', nopython=True, nogil=False)
def wait_loop_withgil(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out

4.4. now time wait_loop_withgil

nloops=500
with contexttimer.Timer(time.perf_counter) as numba_wall:
    with contexttimer.Timer(time.process_time) as numba_cpu:
        result=wait_loop_withgil(nloops)
print(f'numba wall time {numba_wall.elapsed} and cpu time {numba_cpu.elapsed}')
print(f"numba speed-up factor {(pure_wall.elapsed - numba_wall.elapsed)/numba_wall.elapsed}")

4.5. not bad, but we’re only using one core