This is a note intended for my future self. Here’s how to do it:
- Have a list of the values that you expect for each group.
- Iterate over that list, and look up the values using
.loc
.
Say I want to group by months, but not all of the …
This is a note intended for my future self. Here’s how to do it:
.loc
.Say I want to group by months, but not all of the …
This was an incredible experience.
The crowds were just amazing, and seeing my son Olson at mile 6 in the rain and cold was amazing. With the weather, Olson on my mind, and the huge crowds, parts of the race were very emotional.
At about mile 11-13, I sped up …
First, important dates:
A brief moment of glory:
As a grad student trying to understand the emotional content of some unreadably large collection of texts, a typical night in grad school can often go something like this: You’re up late at night planning a new research study, thinking about trying some of this fancy sentiment-based text analysis …
In the course I teach at UC Berkeley in the MIDS program, we use CodePen to build interactive web graphics. There are a host of reasons to use CodePen, but setting that aside for now, let's talk about how to host data files for CodePen. CodePen lacks a way for …
This is a re-port of an article I wrote for the MassMutal blog: https://blog.massmutual.com/post/a-grads-view-solving-real-problems.
Just this past May, I graduated from the University of Vermont and the Computational Story Lab research group to work as a senior data scientist with MassMutual. While there are many …
Well, I never thought it would happen. Today I qualified for the Boston Marathon with a 3:00:04 showing at the Presque Isle Marathon.
If you perform EDA using jupyter notebooks, it’s really easy to share those results with some moderate interaction via a jupyter dashboard. Here are the basic steps:
Build the analysis, etc. Assuming this is done locally. Install the dashboard layout extension and lay out some sweet graphs. Optional: decorate …
Well, let’s do a simple test and find out if it speeds up the process of one-hot encoding a variable in our data. There are other reasons to set it, and we’ll get to those. Starting with the very helpful code snippet from spark-gotchas:
import json
from pyspark …