Sunday, January 21, 2018

Examination of Success Thresholds in College Football

Anyone familiarizing themselves with gridiron football analytics will quickly acquaint with success rate. Success is widely defined by an offense gaining 40-50% of yards to go on 1st down, 60-70% on 2nd down, and 100% on 3rd and 4th downs; preventing gains of said percentages defines success for defenses. Counting all the successful plays in a drive, game, or season and dividing by the total quantity of plays for that period yields the success rate. Personally, I am more interested in whether an activity was productive, unproductive, or counterproductive but I’ll save that for another post. However, curious as to how the thresholds for success may have been established and if there are nuances to current definitions, I examine it here. 

Myself and others before me, suspect that success rate is derived from traditional football notions of ‘staying ahead of the chains’ or ‘setting up for third and short’. Popularized by Football Outsiders, success, by our definition—like many gridiron analytic concepts—can be traced at least to the mid-1980s when it was outlined on p. 69 of the Hidden Game of Football. The authors used 40%, 60%, and 100% to benchmark ‘wins’ and ‘failures’, as well as a derived qualitative measure of success that awards more credit (i.e., >1-point) for big plays and penalizes turnovers and lost yardage (i.e., negative points). 


How do we determine what is a successful play? That is not, how success is defined according to X-amount of yardage gained or lost on a given play, per se, but how X-amount of yardage on a given play portends future success when aggregated from many, many plays in a similar context. Given this notion, let us define success as a play occurring on a drive that ends in scoring either a TD or FG.


Let us define success another way, too: a first down occurring on or after a given play on a drive (or series, in this case, really). For example, take a 2nd and 8; if there is a first down obtained on that play or subsequent play in the drive, that 2nd and 8 would be considered as having occurred on a successful drive (or, series). Alternatively, imagine a 1st and 10 which is, say, the fifth play of a drive and occurs after obtaining having at least one first down on the drive; if there is not a first down obtained on that 1st and 10 or any play later in the drive, that 1st and 10 would be considered as having occurred on an unsuccessful drive (or series).


For data, I have all pass and rush plays from games played by Division 1 college football teams from 2005-13—995,895 plays. For each play, I included an indicator of whether the play occurred on a scoring drive and whether there was another first down on that drive. As these are a binary variables, indicating yes or no, logistic regression is suitable. As the predictor variable we will use yards gained on a play divided by the yards to go on the play. This way we can say gaining X% of yards on a given down down is the threshold of success. Oh, so since we’re using college data, we’ll use the thresholds utilized by Football Study Hall of 50%, 70%, and 100% on 1st, 2nd, and 3rd and 4th downs, respectively.


Using logistic regression and ROC curves, we identify thresholds for the proportion of yards gained on each down that correctly predicts both the maximum quantity of plays on successful drives while minimizing the quantity of plays on unsuccessful drive wrongly predicted as successful (in our data set). This becomes our threshold of success. Figure 1 shows the success thresholds from these analyses for scoring drives in purple, drives with another first down in green, and the commonly applied success thresholds in orange.

Figure 1. Thresholds for Success

That the thresholds for 3rd and 4th down are essentially identical for scoring and first downs is unsurprising because scoring requires gaining at least the yards to go. The disparity in thresholds on second downs is also intuitive. It suggests that gaining a greater portion of the yards to go on 2nd down portends a more successful drive. The lower threshold for scoring drives on 1st down is interesting, however. It may be that obtaining 40% of the yards to on first downs typically setups a 2nd and 6 with offensive being in neither a definitive rush nor definitive pass situation. This, in turn, could conceivably lead to future success and the disparity here compared to the commonly used threshold. 


I was curious also how field position affects success. Let us focus only first downs, for convenience. I computed whether each play was a success based on the threshold for scoring drives described above; we’ll call this the fixed threshold. A mixture model was used to segment the field into 8 segments. Several logistic regression models were blended to generate thresholds for each segment, which we’ll call blended thresholds.i This is shown in Figure 2. Yard line 1-9 is closest to the defense’s end zone. The bottom row of panels are successful plays based on the fixed threshold and the blended threshold on the top row.
On the X-axis are Yes or No to indicate whether a play actually occurred on a scoring drive or not. Green indicates a play was predicted to occur on a non-scoring drive and orange indicates a play was predicted to occur on a scoring drive. We can see the fixed threshold emerged because it accurately predicts so many plays on unsuccessful drives in opponent’s territory.

Figure 2. Comparing Fixed and Blended Success Thresholds by Field Position on First Down

Summarily, this report showed that, at least in college football, success thresholds are relatively constant whether success is defined as a drive ending in a score or whether there is a first down after a given play. Secondarily, the report provides evidence that statistically-derived success thresholds vary by field position, at least on first down. Thus, future work should examine how adjusting thresholds by field position affects the valuation of player and team performance when using success rates.






iTo do this, I averaged the threshold from three logistic regressions. For each group I obtained thresholds from three logistic regressions with the following subset of the data: [a] plays in each field position segment, [b] all plays in each field position segment and all plays from field positions closer to the defense's end zone, and [c] all plays in each field position segment and all plays from field positions farther from the defense's end zone.