home introduction documentation @YardsGained

Introduction

With Yards Gained, you can perform queries against nearly every regular season NFL play since 2000. For example,

rushing yards

will count the total number of rushing yards by all players since 2000. That's not very interesting though. We can split that sum across a few different dimensions, for example by season:

rushing yards : season

or player:

rushing yards : player

or team:

rushing yards : team

We can also look up multiple statistics at the same time:

rushing yards, passing yards : team

There are a couple things worth noting here. First, we can sort the table by any of the columns by clicking anywhere in the table (click again to reverse the order). Second, the cells are colored based on the relative magnitude of the values in a given statistic.

Houston Texans fans might be upset with this table because they didn't play until 2002, so comparing total rushing and passing yards isn't really fair. Let's consider yards per game instead:

rushing yards / games, passing yards / games : team

Here we took multiple statistics and combined them into a single statistic using arithmetic. Another example: the ratio of passing yards to rushing yards, by team.

passing yards / rushing yards : team

which shows us which offenses were most dependent on passing or rushing since 2000. But thirteen seasons is a long time, so it might be more interesting if we also split it by season across columns:

passing yards / rushing yards : team / season

The cell coloring calls out significant outliers, but we can also put all the values in a single sortable list. By using a ',' instead of a '/' to split by season, we can split the data across rows:

passing yards / rushing yards : team, season

which is now sorted by magnitude. Since 2000, Arizona has had the two most pass-heavy offenses, by yardage. But maybe the decision to run or pass is more interesting than the yards gained, so let's take a look at that instead.

passes / carries : team, season

This list looks a bit different. But maybe these are just teams that were behind a lot and forced to pass to try to catch up; let's check that.

passes / carries, wins : team, season

Sure enough a lot of these teams were below .500. We can try to weaken this bias by only considering plays when the teams are tied.

passes / carries while tied : team, season

By saying "while tied", now we're only considering statistics recorded on plays where the offense and defense are tied. We can filter on a range of different properties, like team, opponent, home/away, down, distance, season, player etc. Here we've limited it by score.

Looking at the data, the 2011 Packers really stand out here. Is this just noise though? If we look up the total number of offensive plays (i.e. first down attempts) we can try to see how reliable these numbers are.

passes / carries, first down attempts while tied : team, season

As it turns out the Packers only had 88 first down attempts, the third fewest since 2000, so this number probably isn't that meaningful. (We might suspect that this is because they were so dominant that they were almost always leading, something we can look into shortly.) We can remove them from the table by specifying a minimum number of plays, say 150:

passes / carries, first down attempts min 150 first down attempts while tied : team, season

Another way we can try to decrease the noise is by including more plays than just those were the teams are tied. So let's take a look at plays where the teams are within one score:

passes / carries, first down attempts while tied or down 1-8 or up 1-8 : team, season

And if we wanted to see these tendencies by quarter, we can split this data by quarter across columns:

passes / carries, first down attempts min 50 first down attempts while tied or down 1-8 or up 1-8 : team, season / quarter

Another thing we can do is create buckets for these splits. Score, for example, is one where we might not care so much whether a team is up or down by 4 or 5 points. We can can do this by writing "score[N]" where N specifies the size of the buckets.

Earlier we saw that the 2011 Packers had very few offensive plays while tied. We can take a look at the distribution of every team's offensive plays in 2011 across scores using this:

first down attempts in 2011 : score[8] / team

or we can look at it by game:

games in 2011 : score[8] / team

In this table, each value represents the number of games that the team had a play where they were up or down by the given score. The Packers led in all 16 games, and led by two scores in 14 of their 16 games; on the other hand, the Colts trailed in every game.

"Games" (or "Wins"/"Losses"/"Ties") works a little differently than other statistics. We saw in the query above that each team had many more than 16 "games" in each column, yet we know that each team played in only 16 games. This is because a team could be tied, leading by 3, or behind by 10 all in the same game; that game would count in each of those three rows. Each game is counted at most once in each cell, but a single game can be represented multiple times across the table.

One interesting expression with games is "wins / (wins + losses)", which can be thought of as an estimate of a team's probability of winning. We can then get these probability estimates in different situations, for example as function of score and time remaining. Specifically, this will split the score into 3 point ranges, and time into 6 minute ranges, requiring at least 100 games in each score/time combination.

wins / (wins + losses) min 100 games : score[3] / minute[6]

Finally, notice how there are a number of cells that have either no wins or no losses? By default, making an expression with statistics (like "wins / (wins + losses)") makes each statistic in the expression optional. We can make a specific statistic required by adding an exclamation point to the end of it, like "wins / (wins! + losses!)". If we use that instead, we can eliminate some of the uninteresting cells from the table.

wins / (wins! + losses!) min 100 games : score[3] / minute[6]

Hopefully this has helped you understand how to use Yards Gained. The full documentation contains the details on all the statistics, conditions, or groupings you can use.

Still not sure why you query isn't working? Please feel free to tweet @YardsGained with any questions, feedback, or requests.