March 15, 2021

a primer for linear regression (part 3)

Now our focus will shift to multiple regression (i.e. linear regression with >1 predictors), as opposed to simple linear regression (linear regression with just 1 predictor). Simple linear regressions have the benefit of being easy to visualize, and this makes it much easier to explain different concepts. However, real-world questions are often complex, and it’s frequently necessary to account for more than one relevant variable in an analysis. As with the last two posts, we’ll stick with the Palmer Penguins data, and now that they’ve been introduced, I’ll be using functions from the {broom} package (such as tidy(), glance() and augment()) a bit more freely. Read more

March 12, 2021

a primer for linear regression (part 2)

In the previous post of this series, we covered an overview of the Ordinary Least Squares method for estimating the parameters of a linear regression model. While I didn’t give you a full tour of the mathematical guts underpinning the technique, I’ve hopefully given you a sense of the problem the model is attempting to solve, as well as some specific vocabulary that describes the contents of a linear regression. Read more

March 7, 2021

a primer for linear regression (part 1)

This year, my partner has been working to complete her Masters in Natural Resources/Land Management, and several of her assignments have required some data analysis. One topic area we covered together was linear regression/multiple linear regression. As techniques, simple linear regression and multiple linear regression are well-known as workhorses for answering statistical questions across many scientific fields. Given their ubiquity, having the requisite working knowledge needed to interpret and evaluate a regression analysis is highly valuable in virtually any professional field that involves the use or consumption of data. Read more

November 25, 2019

analyzing the october primary debate, using tidytext

(This is a write-up of a talk I gave to the Ann Arbor R User Group, earlier this month.) It seems like the longer one works with data, the probability they are tasked to work with unstructured text approaches 1. No matter the setting –whether you’re working with survey responses, administrative data, whatever– one of the most common ways that humans record information is by writing things down. Something great about text-based data is that it’s often plentiful, and might have the advantage of being really descriptive of something you’re trying to study. Read more

October 14, 2019

predicting my yearly top songs without listening/usage data (part 2)

This is a continuation from a previous post, which can be found here. Okay, picking up where we left off! In this post we’ll dive into building a set of models that can classify each of my playlist tracks as a “top-song” or not. While this is an exploration of some boutique data, it’s also a cursory look at many of the packages found in the tidymodels ecosystem. A few posts I found useful in terms of working with tidymodels can be found here, and here. Read more

September 17, 2019

predicting my yearly top songs without listening/usage data (part 1)

question: how do tracks end up on my yearly top-100 songs playlist? Last year I dug into the spotifyr package to see if the monthly playlists I curate varied by different track audio features available from the API. This time, I’m back with some more specific questions. Maybe they’ve always done this but, Spotify creates yearly playlists for each user, meant to reflect the user’s top-100 songs. I look forward to getting one each year, but I wish I knew more about how it worked. Read more

August 11, 2018

comparing audio features from my monthly playlists, using spotifyr

The NYT has a fun interactive up this week, looking at audio features to see if popular summer songs have the same sort of “signature”. After attending a presentation earlier this year, I discovered that these same sorts of features are accessible through Spotify’s API! How people curate their collections and approach listening to music usually tells you something about them, and since seeing the presentation I’ve been wanting to take a dive into my own listening habits. Read more

July 15, 2018

how should I get started with R?

Here’s some evergreen advice from David Robinson: When you’ve written the same code 3 times, write a function When you’ve given the same in-person advice 3 times, write a blog post — David Robinson (@drob) November 9, 2017 In a world overflowing with data science blogs, I’ve decided to write some notes about getting started in R. I recently crossed Robinson’s threshold, and want to write down my basic advice (so, hi! Read more

October 1, 2017

taking a second look at family net worth with the SCF

Matt Bruenig at the People’s Policy Project (PPP) published a post at the end of September, looking at 2016 data for family net worth as reported by the Survey of Consumer Finance (SCF). Using the 2007 and 2016 waves of the survey, Bruenig grouped family net worth into percentiles, and took the difference between each point. Bruenig broke the results down by race/ethnicity, but generally speaking, aside from the wealthiest Americans, most families still haven’t recovered to their pre-recession level of household net worth. Read more

July 8, 2017

exploring NUFORC sightings

R code used for each of the graphics is available here. While flipping through an issue of the Economist a few years ago, I stumbled across an article summarizing UFO sightings reported across the US. It wasn’t a full feature, but the topic was playful and I lingered on it longer than I spent with the rest of the issue. Read more

Powered by Hugo & Kiss.