Vectorizing code matters

Thu 17 September 2020

I come from the world of MATLAB and numerical computing, where for loops are shorn and vectors are king. During my PhD at UVM, Professor Lakoba's Numerical Analysis class was one of the most challenging courses I took and the deep knowledge of numerical code still sticks with me. My favorite example of a vectorization is when a colleague shared his Lorenz 96 code with me, after writing a really cool paper about it that footnoted the massive amount of computation involved. Well, vectorizing the inner loop was about 4x faster, so now the footnote is a just a carbon footprint.

Fast numerical code is what makes machine learning even possible these days, though I'm not sure how many of the kids these days can write a QR decomposition in C. I'm kidding, because I haven't done it, but I sure as heck could write it in MATLAB (at one point) or in Numpy or Julia now (I'll stick to just magrittr and dplyr in R). A lot of the work I do at MassMutual is fundamentally numerical computation, and the difference of a pipeline that takes hours, even minutes from one that takes seconds is a big deal. Seconds means we can iterate, try more options, and move faster. Still, a lot of numerical code is written in pure Python (no Cython, no Numba), for the flexibility. I'm going to argue that this is a bad idea! Here's a paraphrased email from a colleague:

In pseudocode, this is the 'actuarial' coding dilemma I ran into months back:

EOM = 0
for months in years:
    PREM = 50
    BOM = EOM + PREM
    WIT = 5
    EOM = BOM – WIT

A simple example, but I think shows the BOM/EOM interdependence (there are a few other variables with a similar relationship.) You can’t vectorize BOM without knowing EOM, and you can’t vectorize EOM until you know BOM. Then you might have situations where IF WIT > 0 , PREM = 0. Basically a lot of inter-dependence emerges. Now a lot of the function is not appear easily vectorizable.

Well, I can vectorize this, and I did. Here's the non-vectorized version in Python:

import numpy as np
years = 10
bom = np.zeros(years*12)
eom = np.zeros(years*12)
for month in range(1, years*12):
    prem = 50
    bom[month] = eom[month-1] + prem
    wit = 5
    eom[month] = bom[month] - wit

And here's the vectorized version:

import numpy as np
years = 10
prem = 50
wit = 5

eom = np.arange(years*12)*prem - np.arange(years*12)*wit
# and if you still want bom as an array:
bom = eom + np.arange(years*12)*wit

I also wrote the for-loop even more flexibly (read: as slow as I could think to) by using a list of dicts:

years = 10
prem = 50
wit = 5
result = [{'bom': 0, 'eom': 0}]
for month in range(1, years*12):
    inner = {}
    inner.update({'bom': result[month-1]['eom'] + prem})
    inner.update({'eom': inner['bom'] - wit})

This one above returns a different type of thing, a list of dicts...not two arrays.

We can also import Pandas to stuff results into for all three of the above (so they're consistent outputs, we could save to excel, etc). If we have Pandas loaded, we could use an empty dataframe for iteration, so one more option:

import numpy as np
import pandas as pd
years = 10
prem = 50
wit = 5
df = pd.DataFrame(data={'bom': np.zeros(years*12), 'eom': np.zeros(years*12)})
for i, row in df.iterrows():
    if i > 0: = df.loc[i-1, 'eom']
        row.eom = - wit

With all of those types of iteration, and with the the option to return a dataframe as the result, this is what we get:

vectorized return_type iterate_type time slowdown
True numpy -- 0.607289 1
False numpy numpy 15.2983 25
False list(dict) dict 9.2112 15
True pandas -- 37.8838 62
False pandas numpy 47.0335 77
False pandas dict 1717.72 2828
False pandas pandas 77.5634 127
False list list 1.80763 2
False numpy numpy 14.6285 24
False list c array 0.663318 1


I also add a few Cython versions of the code, showing that you can get vectorized performance without numpy, by using C. This might indeed strike the best balance between readability (keep the for-loop!) and speed.

Numba may also retain the same speedups (it may be as fast as Cython/Vectorized Numpy). In both cases (Cython/Numba), you have to be careful about which datatypes you're using (no dicts or pandas!). I think that it would be possible to make the Cython + Numpy loop just as fast as vectorized numpy if you are smarter about how to integrate them.

All of the code, including the Cython, is available here: