November 25, 2019

analyzing the october primary debate, using tidytext

(This is a write-up of a talk I gave to the Ann Arbor R User Group, earlier this month.) It seems like the longer one works with data, the probability they are tasked to work with unstructured text approaches 1. No matter the setting –whether you’re working with survey responses, administrative data, whatever– one of the most common ways that humans record information is by writing things down. Something great about text-based data is that it’s often plentiful, and might have the advantage of being really descriptive of something you’re trying to study. Read more

October 23, 2019

hey, separation plots are kinda cool

Earlier this week, I was looking around for more ways to spot-check logistic regression models. I’m not sure I fully grok what Pseudo R-Squared statistics mean, and I haven’t seen/heard of an appropriate way to assess residuals from a logistic regression like you can with OLS. In my reading, I came across a plotting technique that’s described in this paper (Greenhill, Ward, & Sacks, 2011). They’re called “separation plots”, and they’re used to help assess the fit adequacy of a model that has a binary variable as its dependent variable. Read more

October 14, 2019

predicting my yearly top songs without listening/usage data (part 2)

This is a continuation from a previous post, which can be found here. Okay, picking up where we left off! In this post we’ll dive into building a set of models that can classify each of my playlist tracks as a “top-song” or not. While this is an exploration of some boutique data, it’s also a cursory look at many of the packages found in the tidymodels ecosystem. A few posts I found useful in terms of working with tidymodels can be found here, and here. Read more

September 17, 2019

predicting my yearly top songs without listening/usage data (part 1)

question: how do tracks end up on my yearly top-100 songs playlist? Last year I dug into the spotifyr package to see if the monthly playlists I curate varied by different track audio features available from the API. This time, I’m back with some more specific questions. Maybe they’ve always done this but, Spotify creates yearly playlists for each user, meant to reflect the user’s top-100 songs. I look forward to getting one each year, but I wish I knew more about how it worked. Read more

August 11, 2018

comparing audio features from my monthly playlists, using spotifyr

The NYT has a fun interactive up this week, looking at audio features to see if popular summer songs have the same sort of “signature”. After attending a presentation earlier this year, I discovered that these same sorts of features are accessible through Spotify’s API! How people curate their collections and approach listening to music usually tells you something about them, and since seeing the presentation I’ve been wanting to take a dive into my own listening habits. Read more

July 15, 2018

how should I get started with R?

Here’s some evergreen advice from David Robinson: When you’ve written the same code 3 times, write a function When you’ve given the same in-person advice 3 times, write a blog post — David Robinson (@drob) November 9, 2017 In a world overflowing with data science blogs, I’ve decided to write some notes about getting started in R. I recently crossed Robinson’s threshold, and want to write down my basic advice (so, hi! Read more

October 1, 2017

climbing into the crater: taking a second look at family net worth with the SCF

Matt Bruenig at the People’s Policy Project (PPP) published a post at the end of September, looking at 2016 data for family net worth as reported by the Survey of Consumer Finance (SCF). The post’s title was provacative (“Black Wealth Cratered Under Obama”), and the same goes for the findings being discussed. Using the 2007 and 2016 waves of the survey, Bruenig grouped family net worth into percentiles, and took the difference between each point. Read more

July 8, 2017

exploring NUFORC sightings

R code used for each of the graphics is available here. While flipping through an issue of the Economist a few years ago, I stumbled across an article summarizing UFO sightings reported across the US. It wasn’t a full feature, but the topic was playful and I lingered on it longer than I spent with the rest of the issue. Read more

Powered by Hugo & Kiss.