May 1, 2021

guessing how many streak-freezes I'll use over the next 4 months

So, this is a small thing, but something I’m proud of. Over the past year, I’ve been practicing German (🇩🇪 Ich habe Deutsch gelernt!) using Duolingo’s mobile app. I don’t have an especially romantic reason for why I settled on the language, but I had watched the first season of Dark on Netflix in 2018, and don’t really enjoy dubbed foreign film/tv. Listening to the language prompted me to try a few lessons initially, but I settled into routine practice while finishing Babylon Berlin1 last fall. Read more

March 15, 2021

a primer for linear regression (part 3)

Now our focus will shift to multiple regression (i.e. linear regression with >1 predictors), as opposed to simple linear regression (linear regression with just 1 predictor). Simple linear regressions have the benefit of being easy to visualize, and this makes it much easier to explain different concepts. However, real-world questions are often complex, and it’s frequently necessary to account for more than one relevant variable in an analysis. As with the last two posts, we’ll stick with the Palmer Penguins data, and now that they’ve been introduced, I’ll be using functions from the {broom} package (such as tidy(), glance() and augment()) a bit more freely. Read more

March 12, 2021

a primer for linear regression (part 2)

In the previous post of this series, we covered an overview of the Ordinary Least Squares method for estimating the parameters of a linear regression model. While I didn’t give you a full tour of the mathematical guts underpinning the technique, I’ve hopefully given you a sense of the problem the model is attempting to solve, as well as some specific vocabulary that describes the contents of a linear regression. Read more

March 7, 2021

a primer for linear regression (part 1)

This year, my partner has been working to complete her Masters in Natural Resources/Land Management, and several of her assignments have required some data analysis. One topic area we covered together was linear regression/multiple linear regression. As techniques, simple linear regression and multiple linear regression are well-known as workhorses for answering statistical questions across many scientific fields. Given their ubiquity, having the requisite working knowledge needed to interpret and evaluate a regression analysis is highly valuable in virtually any professional field that involves the use or consumption of data. Read more

July 7, 2020

lagged OpenTable reservations vs. change in daily case counts

Having broken my personal promise to not work with COVID data1, I decided to revisit the subject after seeing a recent project from Nathan Yau. Yau was looking at OpenTable reservations, and made a clean & well-annotated plot with panels for each state. The OpenTable dataset tracks the difference between seated dining on a given day between 2019 and 2020. Plotting these differences over time gives a view into how quickly people are returning to more regular consumption patterns. Read more

July 1, 2020

a few thoughts on the gap between information and action

I’ve been thinking about this piece by Mimi Onuoha that was published today, particularly this passage: By nearly every statistical measurement possible, from housing to incarceration to wealth to land ownership, Black Americans are disproportionately disadvantaged. But the grand ritual of collecting and reporting this data has not improved the situation. American history is lined with innumerable instances of what scholar Saidiya Hartman bemoans as “the demand that this suffering be materialized and evidenced by the display of the tortured body or endless recitations of the ghastly and the terrible,” only for very little to change. Read more

June 26, 2020

please don't use a basic linear model to predict cumulative case counts in your state

So, this is my first post in a while. I changed jobs in January, and moved back across the country to my hometown of Boise, ID. I was hoping that my first post-move update would be more uplifting, but by mid-March, I didn’t want to write anything, for a variety of reasons. As a person whose job involves cleaning and analyzing data, the pandemic has been surreal– public health, statistical methods, and data visualizations are now daily topics, for basically everyone I talk to. Read more

November 25, 2019

analyzing the october primary debate, using tidytext

(This is a write-up of a talk I gave to the Ann Arbor R User Group, earlier this month.) It seems like the longer one works with data, the probability they are tasked to work with unstructured text approaches 1. No matter the setting –whether you’re working with survey responses, administrative data, whatever– one of the most common ways that humans record information is by writing things down. Something great about text-based data is that it’s often plentiful, and might have the advantage of being really descriptive of something you’re trying to study. Read more

October 23, 2019

hey, separation plots are kinda cool

Ever since I first started learning about regression analysis, I found myself wishing I could do something equivalent to inspecting residuals for logistic regressions like you could with OLS. Earlier this week, I was looking around for more ways to spot-check logistic regression models, and I came across a plotting technique that’s described in this paper (Greenhill, Ward, & Sacks, 2011). They’re called “separation plots”, and they’re used to help assess the fit adequacy of a model that has a binary variable as its dependent variable. Read more

October 14, 2019

predicting my yearly top songs without listening/usage data (part 2)

This is a continuation from a previous post, which can be found here. Okay, picking up where we left off! In this post we’ll dive into building a set of models that can classify each of my playlist tracks as a “top-song” or not. While this is an exploration of some boutique data, it’s also a cursory look at many of the packages found in the tidymodels ecosystem. A few posts I found useful in terms of working with tidymodels can be found here, and here. Read more

Powered by Hugo & Kiss.