Showing posts with label field position. Show all posts
Showing posts with label field position. Show all posts

Sunday, January 21, 2018

Examination of Success Thresholds in College Football

Anyone familiarizing themselves with gridiron football analytics will quickly acquaint with success rate. Success is widely defined by an offense gaining 40-50% of yards to go on 1st down, 60-70% on 2nd down, and 100% on 3rd and 4th downs; preventing gains of said percentages defines success for defenses. Counting all the successful plays in a drive, game, or season and dividing by the total quantity of plays for that period yields the success rate. Personally, I am more interested in whether an activity was productive, unproductive, or counterproductive but I’ll save that for another post. However, curious as to how the thresholds for success may have been established and if there are nuances to current definitions, I examine it here. 

Myself and others before me, suspect that success rate is derived from traditional football notions of ‘staying ahead of the chains’ or ‘setting up for third and short’. Popularized by Football Outsiders, success, by our definition—like many gridiron analytic concepts—can be traced at least to the mid-1980s when it was outlined on p. 69 of the Hidden Game of Football. The authors used 40%, 60%, and 100% to benchmark ‘wins’ and ‘failures’, as well as a derived qualitative measure of success that awards more credit (i.e., >1-point) for big plays and penalizes turnovers and lost yardage (i.e., negative points). 


How do we determine what is a successful play? That is not, how success is defined according to X-amount of yardage gained or lost on a given play, per se, but how X-amount of yardage on a given play portends future success when aggregated from many, many plays in a similar context. Given this notion, let us define success as a play occurring on a drive that ends in scoring either a TD or FG.


Let us define success another way, too: a first down occurring on or after a given play on a drive (or series, in this case, really). For example, take a 2nd and 8; if there is a first down obtained on that play or subsequent play in the drive, that 2nd and 8 would be considered as having occurred on a successful drive (or, series). Alternatively, imagine a 1st and 10 which is, say, the fifth play of a drive and occurs after obtaining having at least one first down on the drive; if there is not a first down obtained on that 1st and 10 or any play later in the drive, that 1st and 10 would be considered as having occurred on an unsuccessful drive (or series).


For data, I have all pass and rush plays from games played by Division 1 college football teams from 2005-13—995,895 plays. For each play, I included an indicator of whether the play occurred on a scoring drive and whether there was another first down on that drive. As these are a binary variables, indicating yes or no, logistic regression is suitable. As the predictor variable we will use yards gained on a play divided by the yards to go on the play. This way we can say gaining X% of yards on a given down down is the threshold of success. Oh, so since we’re using college data, we’ll use the thresholds utilized by Football Study Hall of 50%, 70%, and 100% on 1st, 2nd, and 3rd and 4th downs, respectively.


Using logistic regression and ROC curves, we identify thresholds for the proportion of yards gained on each down that correctly predicts both the maximum quantity of plays on successful drives while minimizing the quantity of plays on unsuccessful drive wrongly predicted as successful (in our data set). This becomes our threshold of success. Figure 1 shows the success thresholds from these analyses for scoring drives in purple, drives with another first down in green, and the commonly applied success thresholds in orange.

Figure 1. Thresholds for Success

That the thresholds for 3rd and 4th down are essentially identical for scoring and first downs is unsurprising because scoring requires gaining at least the yards to go. The disparity in thresholds on second downs is also intuitive. It suggests that gaining a greater portion of the yards to go on 2nd down portends a more successful drive. The lower threshold for scoring drives on 1st down is interesting, however. It may be that obtaining 40% of the yards to on first downs typically setups a 2nd and 6 with offensive being in neither a definitive rush nor definitive pass situation. This, in turn, could conceivably lead to future success and the disparity here compared to the commonly used threshold. 


I was curious also how field position affects success. Let us focus only first downs, for convenience. I computed whether each play was a success based on the threshold for scoring drives described above; we’ll call this the fixed threshold. A mixture model was used to segment the field into 8 segments. Several logistic regression models were blended to generate thresholds for each segment, which we’ll call blended thresholds.i This is shown in Figure 2. Yard line 1-9 is closest to the defense’s end zone. The bottom row of panels are successful plays based on the fixed threshold and the blended threshold on the top row.
On the X-axis are Yes or No to indicate whether a play actually occurred on a scoring drive or not. Green indicates a play was predicted to occur on a non-scoring drive and orange indicates a play was predicted to occur on a scoring drive. We can see the fixed threshold emerged because it accurately predicts so many plays on unsuccessful drives in opponent’s territory.

Figure 2. Comparing Fixed and Blended Success Thresholds by Field Position on First Down

Summarily, this report showed that, at least in college football, success thresholds are relatively constant whether success is defined as a drive ending in a score or whether there is a first down after a given play. Secondarily, the report provides evidence that statistically-derived success thresholds vary by field position, at least on first down. Thus, future work should examine how adjusting thresholds by field position affects the valuation of player and team performance when using success rates.






iTo do this, I averaged the threshold from three logistic regressions. For each group I obtained thresholds from three logistic regressions with the following subset of the data: [a] plays in each field position segment, [b] all plays in each field position segment and all plays from field positions closer to the defense's end zone, and [c] all plays in each field position segment and all plays from field positions farther from the defense's end zone.

Sunday, July 17, 2016

Field Position Part II


In a previous post I discussed how INTs and INT return yardage influenced starting field position (SFP). I will extend that discussion to include each of the other events that directly result in SFP: turnover-fumble returns, kick and punt returns, and missed field goals by opponents. As an aspiring defensive back, I of course took great care discussing interceptions. I will devote little discussion here to fumble recoveries and missed field goals. I will harp on kick-off returns but refrain from discussing punt returns at any depth.

Let me first state that my play-by-play (PBP) data differs slightly from the official record. I excluded yardage gained on returns for TDs in the analysis because a TD precludes SFP. Excluded also was return yardage gained prior to a turnover-fumble.

Concerning INTs, I emphasized that ending opponents’ possessions is most salient and that INT return yardage is a somewhat superfluous stat. INT return yards may be useful to compare playmaking abilities between DBs, although statisticians, teams, and observers might be better served knowing the SFP that resulted from an interception. This notion is definitely applicable for fumble returns where, again, the ending of opponents’ possession is most salient.

Likewise, it is also relevant for rating punt returners. For instance, a player fair catching a punt at his own 9-yard line would be recorded as a fairly unremarkable zero yards (i.e., it is counted in his average PRY). However, the fair catch was probably initiated in the presence of proximal defenders who could have disrupted the impetus of the punted ball at say, the 2-yard line had the returner declined to fair catch. Thus, by fair catching—despite accruing zero yards—the returner in the example would improve his team’s SFP by 7 yards (of course, the defense downing the ball is hypothetical).

The foregoing notion of field position in lieu of yardage is applicable to kick returns as well. For example, let us review the 2014 NFLleading kick-returners by average yards per return. I have Bruce Ellington of the 49ers at 24 returns for 25.9 yards per return;c.f. he ranks about ninth in KR yards. However, Ellington gives his offensive teammates an average starting FP at the ~23-yard line—18th on my list of qualifying players. It may be poor decision making on his behalf or poor block execution behalf of his teammates or that he generally fields kickoffs from superior kickers but we must acknowledge Ellington’s average catch-spot (CS) on KRs was nearly 3-yards into the endzone, ranking third-deepest on my list of qualifying players.1

Although this post is about SFP, the above anecdotes underscore the entanglement of variables involved in appraising performances with yardage accrued. However, Ellington still gained those yards. If we are comparing players (or even coverage units), perhaps, Ellington does rank ninth in KR yards. However, football is about team success and on a given drive, a team is increasingly inclined to success the closer it begins to its opponent’s endzone. Conversely, Ellington’s team did start 3 yards closer to the endzone then would result from him taking more touchbacks.

Moving on, for all teams in the 2014-15 NFL season, I obtained all non-TD turnover-fumble returns, interceptions, kick and punt returns, and field goals missed by opponents using the Pro-Football Reference PBP searchtool. Opponents’ missed FGs include blocks but excludes blocks returned for TDs. For all plays except opponents’ missed field goals, I extracted [a] the spot of the INT, fumble recovery, or catch and [b] the spot at which the player was downed following the return. Computed with those values were [c] return yards or 20 for a touchback and [d] the SFP of the player’s offensive teammates. SFP was scaled such that teams’ own goal lines equaled zero and opponents’ goal lines equaled 100; greater yards indicate better SFP.



Table 1. Counts, Average SFP, and Average Return Yards for Events Resulting in SFP, NFL 2014-15
TEAM TOTAL EVENT COUNTS AVERAGE STARTING FIELD POSITION BY EVENT AVERAGE RETURN YARDS BY EVENT
KR PR FR INT oMFG SFP KR PR FR INT oMFG KR PR FR INT
KAN 68 76 5 5 5 29.3 25.4 28.7 22.4 44.0 23.6 25.4 8.6 0.0 15.2
CIN 78 74 5 19 5 30.3 24.7 30.3 27.4 50.8 26.0 24.9 8.4 0.0 9.2
NWE 68 64 7 16 5 30.6 22.7 32.0 36.9 51.8 29.6 22.1 7.5 0.4 11.7
DAL 75 66 12 16 2 28.9 21.1 26.9 35.5 43.3 21.0 22.3 7.3 2.6 9.3
TAM 81 63 11 11 7 26.6 20.7 25.3 30.1 48.8 32.6 21.6 7.2 1.3 6.9
IND 79 88 13 11 4 28.7 22.5 28.4 30.3 43.2 24.0 24.3 7.0 2.6 8.7
BAL 68 73 12 10 6 28.7 23.3 30.4 32.9 48.2 27.8 22.9 6.9 4.3 9.1
JAX 92 74 12 5 4 25.7 22.1 21.5 24.3 51.0 31.8 21.9 6.4 0.8 13.0
PHI 83 85 16 9 5 30.0 22.9 28.8 33.8 41.6 23.4 20.9 6.2 6.3 6.2
MIN 73 74 4 11 6 27.3 25.0 25.5 15.3 49.0 30.3 21.9 6.2 0.0 8.2
STL 79 74 11 10 1 28.0 22.1 28.8 32.4 42.1 35.0 22.9 5.9 3.3 10.8
BUF 71 86 8 18 7 30.2 21.3 27.0 25.5 60.4 26.9 20.6 5.9 4.4 19.2
OAK 94 81 4 9 5 24.2 19.9 24.7 26.0 61.9 24.2 21.4 5.7 5.3 8.4
CHI 97 49 8 13 6 25.9 21.1 27.0 15.9 43.6 23.8 20.5 5.6 2.4 10.9
SDG 81 66 8 6 2 26.4 21.3 26.5 24.1 31.7 33.5 21.1 5.5 0.0 12.0
SFO 70 74 5 21 0 27.8 22.6 28.3 16.2 48.0 - 22.9 5.5 0.0 18.8
ATL 89 55 7 15 4 26.5 22.4 25.7 32.3 40.5 27.5 22.5 5.4 5.6 6.9
MIA 82 57 10 11 6 31.1 24.2 25.5 21.3 50.6 23.5 23.9 5.3 0.0 17.1
DEN 75 84 5 16 5 28.9 22.6 28.2 26.0 54.4 25.2 21.4 5.3 0.4 10.8
TEN 89 72 6 11 5 25.8 23.2 24.8 38.8 52.7 31.0 22.5 5.2 7.2 10.8
ARI 76 77 5 15 4 26.9 19.6 24.8 20.8 51.3 32.5 20.2 5.1 1.8 10.3
NYJ 85 79 7 6 5 27.8 22.5 26.6 31.4 35.0 24.0 22.1 5.1 0.3 9.0
PIT 86 66 10 7 2 25.7 20.7 24.9 32.5 48.9 29.5 21.1 5.0 3.9 18.1
NYG 87 74 9 16 2 28.2 20.7 23.7 31.6 62.1 24.0 21.1 4.9 1.5 16.6
GNB 79 60 7 15 1 28.5 20.1 27.0 37.4 54.7 29.0 20.3 4.8 0.0 15.2
CAR 83 69 13 10 4 27.7 21.8 25.5 32.5 45.2 25.8 21.0 4.5 2.8 19.0
SEA 62 81 9 11 2 30.5 22.4 27.7 29.0 58.2 32.5 21.4 4.2 0.0 14.2
CLE 72 83 7 18 3 26.8 22.6 24.9 27.7 54.1 34.0 22.8 4.2 4.9 14.4
WAS 85 80 9 6 3 25.1 21.4 22.7 29.0 45.5 18.0 20.8 4.0 2.1 5.0
HOU 71 82 10 16 2 27.7 20.4 23.5 31.9 60.6 26.5 20.7 3.8 8.7 16.6
DET 70 81 7 18 4 29.9 21.1 29.4 32.9 59.2 27.3 21.8 3.8 1.8 18.7
NOR 86 62 6 12 0 25.5 22.2 22.0 18.5 42.9 - 22.3 3.0 0.0 12.5
League Event Counts Average Field Position by Event Average Return Yards by Event
AVG 79 73 8 12 4 AVG 27.9 22.0 26.5 29.1 50.5 27.2 AVG 22.0 5.6 2.3 12.3
SD 9 10 3 4 2 SD 1.8 1.4 2.5 6.4 7.6 4.2 SD 1.3 1.3 2.4 4.2

Table 1 contains 2014-15 distributions, NFL team average SFP and yards gained for each event, and League averages thereof. KRY and PRY are computed with touchbacks equal to 20 yards and no return equal to zero yards. Neither New Orleans’ nor San Francisco’s opponents missed FGs, apparently. There is nothing particularly noteworthy in the table, otherwise.

I also can tell you several things. INTs have the largest impact on the next-SFP when statistically controlling for the initial play spot, the spot at which an INT, fumble recovery, or kick/punt catch occurred, and the yardage gained on the return.2 I can also tell you that for all NFL teams, the majority of SFP yardage is derived from either KR yards or PR yards. Table 2 provides some insight into why this is.



Table 2. Characteristics of NFL Based on Majority of SFP
Majority of Team SFP From
VARIABLE KR PR
Teams Count 11 21
avg SFP 27 28
avg SFP Unproductive Drives 24 24
avg KR-SFP 22 22
avg Unproductive Drive Yards 17 16
avg Punt Yards 45 45
Opp avg Punt Return Yards 9 9
avg Def. SFP After Unproductive Drive 24 23
Opp avg Unproductive Drive Yards 16 16
Opp avg Punt Yards 45 45
avg Punt Return Yards 5 6
% All Drives Turnovers 14% 11%
Opp % All Drives Turnovers 12% 12%
% All Drives End w/ Score 32% 35%
Opp % All Drives End w/ Score 39% 32%
win% 35% 58%
NOTE: Unproductive drives are defined as those that end without a score.
Scoring drives are those that ended in TDs or FGs.


In Table 2 we see that the two types of teams perform similarly in most situations. Notably, teams whose majority of SFP is derived from KRs commit TOs slightly more frequently. As an aside, this might suggest that while essentially random, a modicum of TOs may be attributable to offensive ineptitude (albeit, in single season sample). Those teams’ opponents also end drives by scoring considerably more frequently—23% more—than teams whose majority of SFP is derived from PRs. The PR-teams score slightly more frequently.

Most striking in Table 2, though, is the disparity in win percentage. The KR-teams can be expected to win 5.6 games whereas PR-teams can be expected to win 9.3 games. Thus, I conclude that, despite the indelible impact of Devon Hester or the ’84 Seahawks’ 3-4 monster, ultimately, SFP is largely the result of an ungenerous defense supplemented by relatively consistent and careful offensive play.

Summarily, the impact of various events on starting field position was examined using data from the 2014-15 NFL season. Although INT yards are most impactful on SFP in isolation, when statistically controlling for event-spot and return yardage, the majority of SFP is derived from either KR or PR yards. Likewise, winning teams garner most of their from PR yards. I concluded that this effect is likely due to defensive stops and consistent, careful offensive play.



1 Minimum 1 KR per game scheduled.
2 To accomplish this, SFP was regressed on to play start spot, event spot, and yards gained. The residuals were saved. An ANOVA was performed with those residuals as the dependent variable and event type as the independent variable. A significant effect of event type was found, F(4, 5641) = 17.422, p < .001. Roughly, planned post hoc comparisons indicate the effect of event on SFP could be ranked as INT > FUM > PR > MFG > KR.

Saturday, January 16, 2016

Field Position Part 1: Interception Return Yards



In a previous post I mentioned my intent to explore the effects of defense and special teams on field position. This is part one of that investigation. In another post I offered a method of valuating the average yardage of an interception (for the intercepting team). At present, I will discuss a different method of calculating and valuating INT yards using play-by-play (PBP) data.

Yes, collegiate and professional statistical records include yardage gained on interception returns just as passing or rushing yards are included. However, we track passing and rushing yards as a measure of progress and productivity—a measure of player or team performance. We track interceptions because each signifies an exchange of possession. The interception itself is the measure of player or team capacity—not the yardage gained on returns. Interception return yards are unexpected gratuity.
Chris Harris picks off and returns a Kyle Orton pass.

For instance, consider two interceptions from 2014. A Tony Romo pass was intercepted by NY Giants’ Prince Amukamara and Buffalo Bill Kyle Orton’s pass was intercepted by Dever Bronco Chris Harris. Amukamara and Harris were each credited with 38 INT return yards. Amukamara’s half way through the second quarter of a tie game and Harris’ with 5 minutes remaining in the third quarter, his team leading 21-3. Both players’ offenses scored on their following drives. So what differentiates Amukamara’s INT return from that of Harris? Field position.

Amukamara intercepted Romo’s pass at the NYG 35-yard line and returned the ball to the Dallas 27. Harris intercepted Orton’s pass at the Bronco 2-yard line and returned it to the Bronco 40. Indeed, Harris’ INT may be more valuable because he ended the possession of an opponent in scoring position.[1] But, Amukamara’s 38 INT return yards put his offensive teammates in field goal position before they lined up.

An offense gaining no more than 2.3 yards on every play of every game would be disbanded. A DB intercepting one pass per game and being downed at the spot of each INT would receive a max contract and an eponymous island. No writers would denigrate him for failing to gain yardage after the INT. Any coach or fan would prefer an INT returned to the 50 than one returned to his own 3, of course; but surely every coach or fan would prefer an INT to the opponent having possession. 

To me this means that, although there is value in knowing the yardage gained from the spot of an interception to the spot an interceptor is downed, ultimately, it is more meaningful knowing the field position produced by that gained yardage. That is, both players in our example should be credited 38 INT return yards but the values in the game at the point of each player being downed were more accurately described as 73 or 40. This is particularly true if we desire statistics that reflect happenings on the field.

I ran a Pro-Football Reference Game Play search for all interceptions in 2014 regular season excluding pick-sixes and interceptions with lost fumbles on the return. Pick-sixes were excluded because a touchdown precludes an offense driving and, thus, are uninvolved in the tabulation of average starting field position. Four returns with fumbles lost by the interceptor and recovered by the intercepted team were excluded because possession was regained.

Extracted from that data were the (a) spot of the interception and (b) spot of being downed following INT return. Computed with those values were the (c) interception yards from the spot of the INT to the spot of being downed or 20 for touchbacks and (d) starting field position for the interceptor’s offensive unit measured from a team’s goal line to b, the spot of being down. All spot-yardage values were scaled from 1 to 99 with 1 being the intercepting team’s own goal line and 99 being their opponents’ goal lines.

Table 1 contains various interception statistics for 2014 NFL teams, including League averages. The interception return yardage value we are interested in is Mean INT FP column. Teams are ranked by average starting field position following interceptions. Interestingly, I was forced to revisit my earlier debate of the greater value of Amukamara’s and Harris’ interceptions. It appears that over the course of the 2014-‘15 seasons, the Giants’ defense endowed their offense with the greatest field position advantage with interceptions but the average spot of their 16 interceptions was nearly midfield. Compare this to the average spot of the 16 Dallas Cowboys interceptions, their own 30—where opponents are within field goal range. Although intuitive, interception spot increased with field position following interception (N = 393, r = .81, p < .001). Interception return yards also increased with field position following interception (r = .41, p <.001).


Table 1. Interception Yards and Interception Field Position (Yards) Following Interceptions for 2014-15 NFL Teams
TEAM Mean FP Non-TD INT Mean INT Spot Non-TD INT Yards Mean nTD INT Yards INT FP Yards Mean INT FP
SDG 26.4 6 18.8 72 12.0 185 30.8
CAR 27.7 10 23.4 190 19.0 424 42.4
NYJ 27.8 6 24.7 54 9.0 202 33.7
KAN 29.3 5 27.0 76 15.2 211 42.2
CHI 25.9 13 30.1 142 10.9 533 41.0
PIT 25.7 7 30.1 127 18.1 338 48.3
DAL 28.9 16 30.3 149 9.3 634 39.6
NWE 25.5 12 30.9 140 11.7 511 42.6
STL 28.0 10 31.3 108 10.8 421 42.1
ATL 26.5 15 32.1 104 6.9 586 39.1
MIA 31.1 11 33.1 188 17.1 552 50.2
SEA 27.8 21 33.5 298 14.2 1001 47.7
IND 28.7 11 34.5 96 8.7 475 43.2
SFO 30.5 11 34.6 207 18.8 588 53.5
PHI 30.0 9 35.3 56 6.2 374 41.6
JAX 25.7 5 35.8 65 13.0 244 48.8
CLE 26.8 18 37.9 260 14.4 943 52.4
ARI 26.9 15 38.3 154 10.3 728 48.5
BAL 28.7 10 39.1 91 9.1 482 48.2
DET 29.9 18 39.3 336 18.7 1043 57.9
NOR 30.6 16 39.3 200 12.5 829 51.8
GNB 28.5 15 39.5 228 15.2 820 54.7
CIN 30.3 19 40.4 174 9.2 941 49.5
WAS 25.1 6 40.5 30 5.0 273 45.5
MIN 27.3 11 40.8 90 8.2 539 49.0
BUF 30.2 18 41.2 346 19.2 1088 60.4
TEN 25.8 11 41.7 119 10.8 578 52.5
TAM 26.6 11 41.9 76 6.9 537 48.8
DEN 28.9 16 42.1 173 10.8 846 52.9
HOU 27.7 16 44.0 265 16.6 969 60.6
NYG 28.2 16 45.5 266 16.6 994 62.1
OAK 24.2 9 53.4 76 8.4 557 61.9
LEAGUE 27.9 12.3 36.8 154.9 12.6 607.7 49.5
Note: FP = Field Position. NON-TD INT = the yards produced on all team interceptions that do no result in TDs; this would be the values reported in League statistics minus yardage from pick-sixes.





[1] Pro-Football Reference’ Expected Points model tells us that Amukamara’s INT was worth -3.78 EPA and Harris’ -1.6 EPA but Amukamara’s INT yielded a net EP -3.56 and Harris’, -5.91. Amukamara’s INT is worth a greater EPA value probably due to resultant field position but Harris’ INT has a greater net EPA value because his opponent was near the endzone he was defending.