andy reagan

Recent Posts

How to groupby in Pandas with a missing group

Wed 25 April 2018
This is a note intended for my future self. Here’s how to do it:
1. Have a list of the values that you expect for each group.
2. Iterate over that list, and look up the values using .loc.
Say I want to group by months, but not all of the …
Boston 2018

Mon 16 April 2018

This was an incredible experience.

The crowds were just amazing, and seeing my son Olson at mile 6 in the rain and cold was amazing. With the weather, Olson on my mind, and the huge crowds, parts of the race were very emotional.

At about mile 11-13, I sped up …
UVM Twitter data notes

Sat 25 November 2017
First, important dates:
- 2008-09-11: We have the deca-hose from here, with a higher % of the tweets at the beginning and down to 10% now (with the total volume increasing greatly over that time). Geo is (was) roughly 1% of all tweets, first with a "coordinates" and then with the "places …
Run rabbit run

Thu 23 November 2017

A brief moment of glory:
Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

Tue 14 November 2017

As a grad student trying to understand the emotional content of some unreadably large collection of texts, a typical night in grad school can often go something like this: You’re up late at night planning a new research study, thinking about trying some of this fancy sentiment-based text analysis …
Linking files from GitHub in CodePen

Wed 08 November 2017

In the course I teach at UC Berkeley in the MIDS program, we use CodePen to build interactive web graphics. There are a host of reasons to use CodePen, but setting that aside for now, let's talk about how to host data files for CodePen. CodePen lacks a way for …
A grad’s view: Solving real problems

Fri 29 September 2017

This is a re-port of an article I wrote for the MassMutal blog: https://blog.massmutual.com/post/a-grads-view-solving-real-problems.

Just this past May, I graduated from the University of Vermont and the Computational Story Lab research group to work as a senior data scientist with MassMutual. While there are many …
Boston-bound for 2018

Sun 10 September 2017

Well, I never thought it would happen. Today I qualified for the Boston Marathon with a 3:00:04 showing at the Presque Isle Marathon.
Enabling Jupyter notebook dashboards

Thu 04 May 2017
If you perform EDA using jupyter notebooks, it’s really easy to share those results with some moderate interaction via a jupyter dashboard. Here are the basic steps:
1. Build the analysis, etc. Assuming this is done locally. Install the dashboard layout extension and lay out some sweet graphs. Optional: decorate …
Should I set metadata manually in pyspark?

Thu 04 May 2017
Well, let’s do a simple test and find out if it speeds up the process of one-hot encoding a variable in our data. There are other reasons to set it, and we’ll get to those. Starting with the very helpful code snippet from spark-gotchas:
```
import json

from pyspark …
```

« Page 4 / 7 »