andy reagan

Recent Posts

Column level encryption on a Vertica Database

Wed 08 January 2020

Vertica is a very powerful analytics database, and security is important! You might need to store sensitive data, like SSN, but you don't want the SSNs available to anyone who can see the table on the database. To provide an additional layer of security, we can encrypt the SSN itself …
Analyzing Strava metadata

Mon 30 December 2019

I love running, and I love stroller running with my son even more. Strava is my go-to fitness app and I've tagged all of my stroller runs with a searchable tag so I can count the miles we've logged together, mostly while he has slept!

The search functionality on Strava's …
Presque Isle Marathon

Sun 08 September 2019

PR Baby! Boston bound for 2020.
Exploring the science behind the Yasso 800

Sat 17 August 2019

If you haven't heard of them, Yasso 800s comprise a infamous running workout that touts itself to predict your marathon time. The name was coined by Amby Burfoot, paying homage to Runner's World editor Bart Yasso.

While even Yasso himself has professed he had no idea why the math worked …
Developing Python on Vertica

Fri 05 July 2019

Vertica is a very powerful analytics database, and we can easily extend functionality now by building in Python functions. This is great and all, so here I'll focus on setting up a development environment for building a simple UDx.

The documentation from Vertica is not super specific, so this may …
Boston Marathon 2019

Mon 15 April 2019

What a day! Congrats to all the finishers. I count myself very grateful to be at the start line of this one, with my wife at 39 weeks pregnant. Didn't have the run that I had hoped out there, wrote some checks that my legs couldn't cash with a first …
Scoring arbitrarily large datasets with Pandas + Sklearn

Wed 10 April 2019

The workhorses of data analysis and modeling in the Python universe are undoubtedly Pandas and Sklearn. I won't extoll their virtues here, but focus on solving one limiting problem. One of the major limitations of these libraries is the size of data they can handle.

In Pandas, the rule of …
Writing LaTeX in Atom

Fri 20 July 2018

Atom is a code editor. The defaults try to "complete" words from your writing, and don't highlight spelling. After many months of using a code editor to write, it's clear that I've gone backwards and I should at least be using spell checking!

These two settings vastly improve the latex …

Tricks for coercing Pandas into parquet

For coercing pandas date times (stored as numpy datetime):

for col in df.columns[df.dtypes == np.dtype('<M8[ns]')]:
    # https://stackoverflow.com/questions/32827169/python-reduce-precision-pandas-timestamp-dataframe
    # apply(lambda x: x.replace(microsecond=0))
    df[col] = df[col].values.astype('datetime64[s]')

For coercing python datetime (here, a datetime.date, there …

Some notes about running D3 inside Jupyter

Fri 27 April 2018

Many visualization packages rely on using D3 in the browser, and those include: Plotly, Vega, and mpld3 (links point to the code for how these projects get JS interacting with Jupyter, using IPython’s display module). Some people have no idea why it’s so hard, and I’ll count …

« Page 3 / 7 »