This week, I got to see some really cool stuff.
Not only did
put out some more material on his NASCAR model (a godsend in an analytics-poor sport), several other authors put out grassroots analytics pieces, featuring their own models and analyses. I’m linking some of them at the conclusion of this article.The volume of this material made me realize: the result of a sports community’s conversation matters—but the conversation itself is worthy of pursuit. We can’t get to a place where we get to make statements like “running backs don’t matter”1 or “shot quality is worth prioritizing” without first engaging in dialogue about those thing. We have to hypothesize, to question, to refute, and to revise. Through that journey, we arrive at a better understanding of the games we love. Perfection, and waiting on it, is an impediment to that process.
In other areas of this publication, I’ve written about a work-in-progress Formula One model. When I pushed out the predictions, I was quite anxious about sharing the output. The batch of results made several problems with the model evident.
And yet, it gave me a chance to write about the assumptions the model makes, and how to think critically about its outputs—important steps on the journey towards better understanding.
In that instance, stepping forward with imperfection was far more helpful than waiting in the wings.
Perhaps a bit over the top, as far as a hobby-related realization goes. Eh. Sue me.
So, this week, I want to share the outputs of another model I’ve been working on for some time—my NASCAR model. It operates on many of the same principles as the Formula One model I describe here.
The methods employed are better suited to the dynamics of a Cup Series event than to the highly-contingent nature of Formula One. Consequently, the output from this model is a little more “reasonable” than its cousin’s.
The high-level features are:
The model uses past driver and team performance at a certain category of track to create a distribution of performances for each driver.
The biggest factors in “performance” are speed and track position.
The window of past performances is ten races (from no earlier than 2022).
The model randomly samples from each driver’s distribution, ranks the set of samples, and returns the ranked set as the finishing order.
The model has a handful of issues (it doesn’t look at practice times or qualifying, for one thing), and it hasn’t been back tested or benchmarked. But as I noted above—I figure its time to contribute my small part to the conversation and keep perfection a goal, not a threshold.
In the future, I’ll break down my study of the model and improving tweaks more thoroughly. For now, here’s the output for this week’s race in Kansas:
If these outputs make sense to you—fantastic! If not, all I ask is this: think about why the output is wrong. That process is the real value I’m striving to create. The model, in that sense, is merely a means, not an end.
The folks referenced above are:
writing at — “The First-Read: Early-Season Trends” writing at — “Week 4 Thoughts and Notes” writing at — “When Does Quarterback Performance Get Real?”Thanks for reading!
They do. But the hyperbolic snippet is representative of the kind of statements that emerge from the NFL analytics community.