Thursday, November 28, 2019

Penalty Yards Awarded for Defensive Pass Interference in the NFL are Unjustified

I watch way too much college football, but I have a limited interest in the NFL. I will watch NFL games featuring Lamar Jackson, Deshaun Watson, the Jaguars with Fournette and Josh Allen, and the Chief’s offense—when those games are actually available in my viewing area. Blackouts are one of myriad reasons a person might develop a distaste for the NFL (e.g., stadium costs, handling of violence against women, ticket costs, nonguaranteed contractsobstructing the publishing of concussion problems, treatment of retired players, and Roger Goodell). In full disclosure, I loathe how defensive pass interference (DPI) penalties are enforced in the NFL, the topic of this post. Regardless, I am specifically curious if the yardage granted by an enforced DPI call in the NFL is justifiable statistically.  

Longtime POTH readers know I am an aspiring defensive back. Hence, I will typically be biased against many DPI calls, but I recognize that DPI does genuinely occur. My concern about DPI enforcement in the NFL is that it is a spot foul. The offensive team is awarded a first down at the yard line where the DPI was committed. It thus assumes that the receiver would have caught the pass were it not for the DPI. A spot foul 10 or 15 yards down field seems reasonable to me, but 30 or 40 yards seems like far too much field position to simply gift the offense on what may or may not have been a catch if there were no PI. In other words, it is unfair to award the offense, say, 40 yards because of DPI given that one of several events could have led to an incompletion if there were not DPI. My thesis is that NFL DPI penalty yardage becomes increasingly unjustifiable as the spot of the foul gets farther from the line of scrimmage. But, I don’t know that and that’s why I’m exploring the matter. 

any act by a player more than one yard beyond the line of scrimmage that significantly hinders an eligible player’s opportunity to catch the ball

I needed play-by-play data. It had to include depth of target, the distance from the line of scrimmage to the yard line where the pass is caught or comes closest to the targeted receiver. I found nothing in the open-source arena but ArmchairAnalysis does provide a sample of their thorough NFL charting data, which I used. Specifically, it is a sample of about 4013 plays from two weeks in the 2019 NFL season. Of those, about 2430 are passing plays. I removed 82 throwaways, 157 sacks, and 11 spiked balls, because these events preclude a pass to a receiver.  This left 2180 passing plays eligible for analysis. 

First, I compared the completion % of the 2273 passing plays that were not sacks, which is 65%, to the NFL average % for all other weeks in 2019 (through week 12), which is 63.8%.  This way we can assess if this sample of passes is somehow dissimilar from passes in the remainder of the season. It was not, χ² = 1.49, p = 0.22, 95% CI [0.63, 0.67]. 


Figure 1. Top panel shows distribution of completions (magenta) and incompletions (brown) by depth of target. Bottom panel shows completion % by depth of target (dotted black) and estimated probability of completion by depth of target (green).


Figure 1 shows the raw completion percentage by depth of target (dotted black) and the estimated probability of a completion when accounting for random variance due to defense and targeted receiver (green).1 The farther a targeted receiver is from the line of scrimmage, the less likely the pass is to be completed. Indeed, this provides some support for my thesis that awarding a first down at the spot of the DPI is increasingly unjustifiable when the depth of target is farther and farther from the line of scrimmage. This is because passes targeted farther down the field are simply less likely to be completed.

Figure 2. Expected yards per pass attempt by depth of target (dotted black) and the expected yards when controlling for random variance due to defense and offense units (orange).


Another way to frame the issue is in terms of yards per attempt (Y/A). That is, how many receiving yards are expected on a pass to a given depth of target. Y/A is a widely used measure of passing efficiency. Figure 2 shows the Y/A by depth of target (dotted black) and the expected Y/A when accounting for random variance due to defense and offense (orange).  This provides additional support for my thesis that awarding a spot foul for DPI is increasingly unwarranted when the DPI is farther from the line of scrimmage. For example, a target of 32 yards down field is expected to gain only 15.9 yards. This might seem odd because 15.9 is less than 32, but Y/A accounts for the probability of the pass being completed. Again, passes target farther down field are less likely to be caught thus a spot foul is less justifiable.

Summarily, this exploratory post yields evidence suggesting that the penalty yards awarded for DPI in the NFL are unwarranted. Although some random variance was accounted for in the models, the major shortcoming is that other factors that affect completion percentage and receiving yards were not accounted for in the analysis. This includes factors such as QB pressure, pass coverage, field position, score differential, and others. Nevertheless, the results demonstrate that, by being a spot foul, the penalty yards awarded to the offense following a DPI (with an uncaught pass) in the NFL are incommensurate with the yardage that would be expected given the depth of target. We here at POTH have no delusions that the enforcement of penalty yardage for DPI will be subject to change. Likewise, we are not anarchists; we respect the game and know that parameters are needed to standardize competition. However, we do feel it necessary to present evidence that directly contradicts any notion of rules designed to ensure a fair game that is decided on the field, by the players.









1
Computed using GLMM specifying a binomial distribution. Depth of Target is fixed effect, with defense and targeted receiver as random effects. QB and offense were considered as random effects but were essentially null and excluded from the model. The model explained about 12% of the variance in completion %. Depth of Target was significant, reducing the log odds of completion by -0.059 for each one yard from the line of scrimmage.

2
Computed using LMM. Simple linear regression had R2 = 0.108 and a smooth regression line had R2 = 0.11, so I used a linear model for simplicity. Depth of Target is fixed effect, with defense and offense as random effects, each with a random intercept for depth of target. The model explained about 18.4% of the variance in A/Y. Depth of target was significant, increasing the A/Y by about 1.03 yards for every three yards of depth of target.


NFL Stats provided by ArmchairAnalysis.com

Sunday, September 8, 2019

How do NFL Kickers Age?

I was watching games on Saturday while chatting with another college football diehard. We were both enamored by the ongoing failure (relatively speaking) that is field goal kicking at perennial powerhouse Alabama. Juxtaposed against their otherwise prolific success, conjecture proceeded about the underlying cause(s) of ‘Bama’s FG kicking woes over the years. 

FG kicking troubles pervade the college game, frustrating fans, and our conjecturous chat led me to wonder if and how NFL kicking is better than college kicking. This led me to wonder if kickers just get better (or more consistent and reliable) as they get older. We did some Google searching but couldn’t find any NFL kicker aging curves for accuracy. So, we made our own.
Figure 1. Histogram of career lengths for NFL kickers 1960-2018


First, we obtained a bunch of NFL kicker data from the wonderful resource known as PFR. This includes season-by-season data for 369 NFL kickers from 1960 through present. These kickers made 33558 of 45777 FGs (73.3%) and 54658 of 56369 (97%) extra points. Based on the distribution of career-lengths shown in Figure 1, there were concerns that the large amount of kickers with 3 or fewer NFL seasons would skew the analysis. Our concerns were reinforced when we looked at Figure 2. 
Figure 2. Mean Field Goal % by length of career in seasons for NFL Kickers 1960-2018


Kickers with 3 or fewer NFL seasons have notably lower career FG% than kickers with lengthier careers. This in itself is not surprising, but it would confound the interpretation of the data. The lower accuracy of kickers with 3 or fewer seasons might lead to exaggerated year-to-year increases in accuracy in the early stage of the kicker career. To better convey this, displayed in Figure 3 is the average FG% in each season of the careers of kickers with ≤3 NFL seasons and >3 seasons. 
Figure 3. Mean FG% in each season for NFL kickers 1960-2018 with career lengths of <4 or >3 seaons


Figure 3 also suggests that FG% increases linearly as kickers age; as if kickers just keep getting more accurate. However, recall the smaller quantities of kickers with lengthier careers seen in Figure 1. The continued increases in accuracy by kickers with lengthier careers may be obscuring the declining accuracy in later seasons of kickers with shorter careers. This is exactly what is shown in Figure 4.
Figure 4. Thick lines are LOESS curves of the average FG% in each season of careers of NFL kickers 1960-2018 with various career lengths; fainter, thinner lines are raw mean FG% in each season


There is a group of ‘super agers’, kickers with careers longer than 16 years, whose annual FG% seems to level out and remain constant around their 12th season—which is about when kickers with careers of 11-16 seasons begin to experience slight declines in accuracy. Likewise, kickers with 11-16 seasons appear to peak around their 7th season—which is about when kickers with careers of 4-10 seasons start to decline.

Let us look at the data another way. Figure 5 contains average FG% through the course of the career normalized such that 0.50 (on the X-axis) represents a season halfway through the course of the kicker career. Figure 5 shows that, aside from kickers with ≤3 NFL seasons, NFL kickers start to experience a downward trend in accuracy about 75% of the way through their career. 
Figure 5. Career length is normalized such that 0.00 = rookie season, 1.00 = final season, and 0.50 = halfway through career. Thick lines are LOESS curves of the average FG% in each season of careers of NFL kickers 1960-2018 with various career lengths; fainter, thinner lines are raw mean FG% in each season


Summarily, the (slightly manipulated) raw data indicate that NFL kickers experience declines in accuracy late in their career (Figure 5). However, using the percent of the way through the career (as in Figure 5) does not conduce toward a prospective aging curve for NFL kickers. That is, a predictive model could not know beforehand how long a kicker’s career will be. In other words, future analyses will need to model an NFL kicker aging curve based on seasons in the League (or perhaps age). Future analyses should also account for era—FG kicking has improved dramatically over the years—and FG accuracy by distance. PAT% might also be informative (more so since 2015). Likewise, some measure of consistency (e.g., coefficient of variation) may provide a more alternative measure (than accuracy) of kicker performance.  

Monday, August 26, 2019

Punt Returner Personalities: The Enterprising Risk-Taker, the Dependable Risk-Averter, and the Consummate-Moderate

Let us examine how the frequency with which punt returners produce negative yardage can be viewed as a sort of personality trait. Moreover, we’ll examine how such a trait can provide insight into on-field performance. The data set (initially) includes 19,363 punt returns from 2002-18 NFL seasons, both regular season and playoffs. We’ll examine the career data of punt returners who spent at least one season as the primary returner for a team. Without snap count data for the whole data set, I defined primary returner as anyone who returned the most punts (plus fair catches) for a team in at least one season; within-season ties for a team were permitted (i.e., could have more than one primary returner from one team in a season). That totaled 227 punt returners, with a range of 8 to 331 career punt returns (only, not fair catches). I then excluded returners with 30 or fewer career returns to have a decent sample size of returns for each returner. This leaves 170 primary returners who, together, returned 16,234 punts and have a median of 79 career returns (25th percentile = 47; 75th = 119). 

The first step was classifying returners based on tendency for negative yards. I started with the proportion of career returns for negative yards based on the findings of the previous post. My criterion for negative return yardage is ≤ -2, excluding returns with muffed catches. I set the threshold at -2 because I felt that returns of -1 yard could occur inadvertently, whereas ≤-2 yards are more likely the result of volitionally moving into the negative. Then I binned returners into three groups using cutoffs at the 33rd and 66th percentiles, or 2.1% and 4.43% of career returns being negative, respectively. I thought this segmentation would provide three groups of risk-preference: risk-averse, moderate, and risk-takers.
Several variables were selected to examine how this conceptualization of risk-preference might relate to on-field performance. For each returner, I computed the following variables to explore relationships between risk-preference and on-field outcomes.

  • % of career returns >6 yards 
  • % of career returns with a TD
  • % of career returns + fair catches that were fair catches
  • % of career returns where the returner muffed the catch
  • % of career returns where the returner fumbled the ball
  • % of career returns where there was an illegal blocking penalty called against a member of the return team


Figure 1. Median career punt returns by risk-preference group

Figure 1 shows that moderate returners have the highest median number of career returns, followed by risk-takers and then risk-averters. One possible explanation is that guys with fewer opportunities to return punts may be more averse to risk, perhaps, in hopes of securing roster spots. There is some potential evidence for this assertion in the data. Returners who were ever a primary returner were less likely to call for a fair catch (32.7%; 7900 of 24129 returns and fair catches) than those who were never a primary (35.3%; 1710 on 4844), χ² = 11.9, p < 0.001. That is, I’m saying that guys who are less experienced returning punts may be more cautious.


Figure 2. For visualization, I split returners into groups above and at or below the median of 53.5% of career returns being >6 yards (2 groups). I split returners into groups above or at and below the median of 1.18% of career returns with a TD (2 groups).

Figure 2 indicates how likely a returner in each risk-preference group is to return for more yards than would be expected by chance alone and return for a TD. Indeed, compared to the risk-averse, moderates (p = 0.02) and risk-takers (p < 0.001) returned a higher proportion of their career punt returns for TDs. Likewise, compared to risk-takers, the risk-averse (p < 0.001) and moderates (p = 0.04) returned a higher proportion of their career returns for >6 yards. If we exclude negative returns and returns for TDs and look at the % of returns >6 yards, the difference between risk-averters (57.9%) and risk-takers (54.7%) is significant (p = 0.01); but moderates (54.8%) are no different than risk takers (p = 0.43).

Figure 3 shows probabilities and standard errors of other variables by risk-preference. Moderates (p < 0.001) and the risk-averse (p < 0.001) had a higher proportion of fair catches than the risk-takers. This suggests that risk-takers were less likely to call for a fair catch, but this is largely my own conjecture as we cannot account for whether returners had more punts out of bounds, downed, declared dead, or touchbacks. Also, we cannot account for how often returners returned a punt when they should have called for a fair catch. 


Figure 3. Proportion of career punt returns are fair catches, muffed, fumbled, or had a holding-type penalty, by group.

Compared to the risk-takers, the risk-averse had significantly fewer returns with penalties (p = 0.005), and there was a similar trend for the moderates (p = 0.12). This finding is potentially due to some quality of risk-takers because the results are essentially unchanged if we control for number of career returns, career average return yards, and career touchdown return %. Likewise, using all of the data, penalties are called less often on negative returns (9.9%; 7 of 720) than positive returns (12.3%; 2292 of 18643), χ² = 3.83, p = 0.05 (penalties enforced and declined are included).

There were no significant group differences in the proportion of fumbles (ps > 0.24) and muffs (ps > 0.32). If we control for the number of career returns, average yards, and TD%, the risk-averse tend to have fewer fumbles than the moderates (p = 0.13) but otherwise, the proportions of fumbles and muffs are unchanged. 

There is a shortcoming of my thesis to consider. I am assuming that returners who are more often tackled for a loss of ≤-2 yards (i.e., negative returns) on returns are also more likely to run into the negative area overall. Based on the available data we cannot determine if this is the case. It may be that the risk-averse and moderates run into the negative just as often, but the risk-takers just are more likely to be tackled after running into negative return yardage space. A caveat to this is that risk-taking returners tended to be less likely to call for fair catches. However, only if we have data indicating that the risk-takers are more likely to forgo fair catches when the coverage unit is closing in on them can it be demonstrated that they are more likely to take risks.

Importantly, these findings show that there appears to be a balance to productive punt returning: Risk-takers may produce more TDs, but they also produce return yardage less consistently, whereas risk-averters may produce return yardage more dependably, they also produce fewer TDs. Ultimately, punt returners who take risks in moderation are probably the most productive in that they consistently produce decent return yardage while still producing TDs at a relatively high rate.


Methods 
We used generalized linear models (GLMs), specifying Poisson distributions, to compare on-field outcomes between the risk-preference groups. There were six GLMs. The dependent variable was the quantity of career returns with a given outcome, for each returner. The independent variable was risk-preference. The DV was offset by the total career punt returns (or punt returns + fair catches for the model of fair catches, this yields a proportional value. The variables are described below.

  • The proportion of career returns >6 yards. I used >6 yards because 7 is the median of 90% of the punt returns in the data (range of -1 to 32) and it is a decent guess at the return yards we would expect to occur randomly. Then I split returners into groups above and at or below the median of 53.5% of career returns were >6 yards (2 groups). In other words, returners with a lower proportion of returns >6 yards are more often returning punts below what we would expect based on chance alone.
  • The proportion of career returns with a TD. I split returners into groups above or at and below the median of 1.18% of career returns with a TD (2 groups). My thought was that risk takers should return TDs at a comparable rate as the other groups, despite having more negative returns.
  • The proportion of career fair catches, which is the number of fair catches divided by the sum of fair catches and returns. Ideally, the number of fair catches would be divided by the number of punts on which the returner was on the field to return the punt. Nevertheless, the thought here is, risk-takers should be less likely to call for a fair catch overall. 
  • The proportion of career punt returns where the returner muffed the catch. I included this as a measure of conscientiousness. That is, can the returner do the most critical and fundamental part of successful punt returning: catch the ball?
  • The proportion of career punt returns where the returner fumbled the ball on the return. I included this as another measure of conscientiousness, perhaps, although fumbles tend to be random events. 
  • The proportion of career punt returns where there was a block in the back or illegal block called against a member of the return team. 




Saturday, August 24, 2019

Returning Punts and Losing Field Position

Fans of collegiate and professional football teams have seen it. The opposing team punts. Arrival of the coverage unit is imminent as your return man situates to catch the ball. He shows nary a handwave, telling everyone there will be no fair catch on this punt. No, yours is an enterprising returner. Upon catching the punt, he will begin to explore the prospect of negative return yardage whilst attempting to evade the coverage unit. Perhaps, he will pick up some punctual blocks from his teammates or move quickly enough to elude would-be tacklers before reaching open grass and improving field position for your offense. Sometimes this risk produces minimal gains and on other occasions, the returns are huge. Yet, to the displeasure of fans and the hypertension of coaches, sometimes many yards are lost, and offenses start drives closer to their own endzone. 

There are other ways that field position is lost. I’m less interested in these, but we can examine them too. Punt returners can muff the catch or fumble the ball during the return. Although neither muffs nor fumbles guarantee lost field position, both create a risk for lost field position. Moreover, both risk turnovers--let alone the detriment to field position. Penalties. Specifically, the holding, block in the back, and clipping varieties, which can negate returns and start the offense closer to their own endzone. 

Who is to blame for lost return yardage? The ability of the return team to pressure the punter and the extent to which the coverage unit protects the punter. The skill of the punter to both focus and execute as well as the distance (and hangtime) of the punt matter, too. It is the punt returner who chooses to run toward his own endzone. It’s also on him if he muffs the catch, and he needs to protect the pigskin to prevent fumbles. Penalties just suck, I'm sorry. Nonetheless, regardless of how it occurs, lost field position is created by an interaction between individual players and their emergent units. One simple way we can look at who is responsible for lost field position on punt returns is intraclass correlations (ICCs; though my methods differ).

Our data are (primarily) 19,363 punts that were returned during 2002-18 NFL seasons (regular and some playoffs; holding-type penalties included). We include in the model return teams and coverage units both by season and across seasons to account for seasonal personnel changes and season-to-season consistency, respectively. Season itself was included to account for League-wide fluctuations in gameplay. In each model, we shall also account for the line of scrimmage and the punt yards. 

Table 1. ICCs of Team Units for ways Field Position is Lost on Punt Returns in NFL, 2002-18
On Punt Returns All Punts
Unit Negative Yards Muffs Fumbles Penalties Penalties
Returner 0.059 0.042 0.028 0 NA
Return Team by Season 0.001 0.014 0 0 0
Punt Team by Season 0.021 0.025 0 0.001 0.004
Punter 0.012 0.008 0 0 0
Return Team in all Seasons 0 0.001 0.006 0.001 0.001
Punt Team in all Seasons 0 0.005 0.008 0 0
Season 0.005 0.019 0 0.006 0.007
Unit R² (sum of ICCs) 0.098 0.114 0.042 0.009 0.012
Line of Scrimmage & Punt Yards R² 0.043 0.092 0.004 0.019 0.073
Total R² 0.141 0.206 0.046 0.028 0.085
Across all seasons, 721 non-muffed punts were returned for negative yardage, or 3.72% of returns, with an average of -3.53 yards (SD = 2.25). Table 1 contains ICCs for each unit. The ICC value means that 5.94% of negative yardage is due to some qualities of punt returners, 2.11% is due to some qualities of the punting teams, and 1.23% is due to the punter. In other words, the ICCs can be summed to obtain an approximate R². The effect of punting team is not statistically significant (p = 0.32) but the effect of punter tends to be (p = 0.07), and the effect of returner is (p < 0.001) (compared to models with each excluded). Together, the remaining factors account for 0.54 %. That only 9.82% of the responsibility for negative returns is meaningfully explained speaks to the stochastic nature of punt returns and special teams in general. 

Unsurprisingly, returners bare the most responsibility for muffs. However, the punting team and the return team appear to contribute to this meaningfully as well. Returners appear to be mostly responsible for fumbles. Penalties appear to be mostly random based on the ICCs all being < 1%. 

Summarily, the present report showed that punt returners carry the most responsibility for negative return yardage, but qualities of the punting team and punter are likely involved. Conceivably then, some punt returners should be more likely than others to have returns for negative yardage. In other words, a subset of returners may attempt to evade tacklers despite the risk of compromising field position for their offensive units. How such a tendency relates to punt return outcomes (e.g., yards gained or touchdowns) is a matter for future study. 



Methods
For analysis we’ll use generalized linear mixed models and specify binomial distribution. Essentially, we are estimating the likelihood that there is a return of negative yards, a muff, a fumble, or a penalty on a given punt and how much of that can be attributed to returners, punter, return teams, coverage units, and the season. Return teams and coverage units were examined by season and overall to account for seasonal personnel changes and season-to-season consistency, respectively. Season was included to account for League-wide trends in gameplay. We also include the punt spot and punt yards. For each GLMM, we'll use the icc() function of sjstats package in R to compute ICCs.

Bulleted below are definitions for each of the ways field position is lost by punt returners and units. 

  • I define negative returns as returns of ≤ -2 yards on an attempted return without a muff. Muffing should be should be considered separately from a decision to run into negative yardage. I set the threshold at -2 because I felt that returns of -1 yard could occur inadvertently, whereas ≤-2 yards are more likely the result of volitionally moving into the negative.
  • Muffs occur when the returner botches the catch. Muffs do not necessarily result in lost yardage, but they risk lost yardage and turnovers.
  • Fumbles occur when the returner loses possession of the ball during the return. Same caveats as muffs.
  • Penalties are holding, block in the back, and clipping penalties committed by the return team. We’ll look at penalties with and without considering returners, that is, on punt returns only (i.e., 19363 punts) and then on all punts (i.e., 41912 punts). This is because the play-by-play data only tell me when a returner was on the field for punt returns and fair catches and so we exclude returner from the model with all punts. 




Saturday, August 17, 2019

How Meteorological Conditions Affect Punting and Punt Outcomes

How does weather affect punting and punt outcomes? We know from prior studies that decreasing temperature is associated with reduced accuracy for field goals from the 25-yard line and farther.  Likewise, longer field goals tend be more accurate in the high altitude of Denver.  Regarding punts, there is evidence suggesting wind reduces punt yards. 

In short, we’re using 37,253 or so NFL punts from 2002-16. A weather data set culled from NFL Savant covers only 28,000 or so of those punts, through 2013, or about 75% of the data set. 
Figure 1. Average Punt Yards by Altitude

We can first see in Figure 1 that altitude has a limited effects on punt yards (PY) with the exception of the highest altitudes. The second highest altitude group includes Atlanta and Arizona, which average nearly 1 yard more on punts (p = 0.001; Atlanta is a dome) and Denver averages nearly 3 yards more per punt (p <0.001). This is consistent with findings on field goals. 

I used a generalized additive regression with smoothing splines to examine weather effects on punting. The punt spot (PS), wind (in MPH), temperature (Fahrenheit), and precipitation (%, 0-1) as well as all interactions between the meteorological variables were all fit with splines. I included the categorical variable for altitude instead of a smooth line for altitude because Denver distorts the altitude spline. I suppose I could have transformed the variable, but the laziness vice is king for the day. I also included a variable indicating if the punt was in a dome or open stadium. 

Figure 2. Modeling punt yards as a function of temperature, precipitation, and wind

As shown in Figure 2, weather appears to influence punt distance. Lower temperatures result in shorter punts. Wind appears to be most influential when precipitation is greatest. Maximum precipitation appears to reduce punts by about 3 yards, on average, compared to no precipitation. The influence of temperature is diminished when wind and precipitation increase. That punt distances are reduced in increasingly inclement meteorological conditions is consistent with the existing literature on field goals and punts in the NFL. The effect of the Denver altitude is consistent in this model, but the effect of Atlanta and Arizona is diminished likely because the model accounts for dome conditions. The upper rightmost panel is weird, though, perhaps because having only a few cases with higher wind speed influences this finding?

Figure 3. Punt return yards by altitude

There appears to be negligible effects of altitude on average punt return (PR) yards on punts that were actually returned (R2 < 0.001, that is R-squared not p!); see Figure 3. Not shown is an ecologically meaningless but statistically significant effect of temperature increasing PR yards on returned punts by about 0.13-yard for every 30° increase in temperature. Ah!, the frivolity that emerges from large data sets.

Figure 4. Punt outcomes by altitude. dd = defense downed/declared dead. fc = fair catch. oob = out of bounds. pr = punt return. tb = touchback.

It appears that there are more touchbacks in Denver, χ² = 92.15, df = 28, p < 0.001. Not much else to say here.

Figure 5. Secondary punt events by altitude. blk = blocked/tipped punt. fum = fumble. muff = returner muffed catch. pen = holding, blocking in back, or clipping penalty on return team. td = touchdown.

There appears to be more penalties in Atlanta and Arizona, but I am unsure why this is. Arizona had six seasons with 5 or fewer wins from 2002-13. ATL had three such seasons. All-around poor team play could have evidenced in more block in the back type penalties on punt returns. 


Figure 6. Punt outcomes as a function of temperature.

I used binary logistic regressions to assess the probability of several punt outcomes associated with several meteorological variables. The meaningful differences (to me) for outcomes due to temperature are between 25° and 75°. Specifically, there is a 5% greater probability of punts being declared dead or downed by the defense (DD) as it gets colder and 5% greater probability of punts being returned when it is warmer. 


Figure 7. Punt outcomes as a function of wind

For wind, I’m looking at the probability difference between no wind and 20mph. The probabilities of fair catches (FCs) decrease and DDs increase as it gets windier. This suggests to me that returners are less likely to even attempt to field the punt when it’s windier. OOBs also increase when it is windier. 

Figure 8. Punt outcomes as a function of precipitation

For precipitation, I’m looking at the change from none to maximum where there is a 5% less probability of a FC when it’s wetter, a 5% greater probability of TBs when it’s wetter, and a 5% greater probability of DD when it’s wetter. Together, these amount to there being fewer punt returns in wetter weather.

In short, the probabilities shown in Figures 6-8 demonstrate to me that punt returners are less inclined to even attempt catching a punt in colder and wetter conditions, and rightfully so. I’m unwilling, however, to conclude exactly the same for windier weather because [a] there are interactions between the meteorological variables not accounted for in these analyses; [b] the analysis accounts for the direction of neither the wind nor the punt; [c] steady winds and, more so, powerful wind gusts could dramatically alter the trajectory of a punt, and leave a return man far out of position. However, as shown above, windier, colder, and wetter conditions reduce punt distance meaning that the coverage unit is approaching the returner much quicker. 

Then I identified 7, 6, and 9 classes, respectively, for temperature, wind, and precipitation using an estimation-maximization procedure. I used these classes to examine the probabilities between meteorological variables and several secondary events: blocks, muffed catches, fumbles, penalties, turnovers, and TDs.  There was no difference in the distribution of PR TDs, fumbles, or turnovers between the classes of any meteorological variable (not shown). 


Figure 9. Muffs as a function of temperature and wind

For muffs, see Figure 9. It appears there is no difference in the distribution across precipitation (χ² = 12.1, p = 0.15; not shown) but the distribution does differ across wind (χ² = 15.9, p = 0.007) and temperature (χ² = 30.92, p < 0.001). Specifically, muffs increased in windier and colder conditions.


Figure 10. Blocked/tipped punts as a function of precipitation and wind

Shown in Figure 10 are blocked punts, which I’m wary of even broaching since it is such a rare event. There is no difference for temperature but there is a difference in the distribution across wind and precipitation. Blocks appear to be slightly less random when it is windier and wetter, but this could be due to adverse conditions affecting punt trajectory or increased pressure due to the expectations that punting is complicated by such weather conditions. However, we must be mindful that there are fewer samples at the meteorological extremes and the results very well could be spurious.


Figure 11. Block in the back, holding, or clipping penalties on the return team as a function of temperature

Distributions of block in the back, holding, or clipping penalties on the return team are no different for wind and precipitation. However, the distributions do differ across temperature such that penalties become more likely in warmer temperatures (χ² = 23.4 , p < 0.001). Penalties likely increase as temperature increases not because of some pressure exerted by warmer conditions per se but, rather, because punt returns are more likely as temperature increases. The odds of a penalty occurring on a punt that is returned are 4.6 times greater than on a punt with no return (z = 26.7, p < 0.001) whereas the odds of a penalty increase by about 0.004 for 1° increase in temperature (z = 3.03, p = 0.002), or by about 0.12 for an increase of 30°.

Summarily, very high altitudes increase punt yards. Colder, wetter, and windier weather reduce punt yards. There is a negligible influence of meteorological variables on punt return yards of returned punts. Punt returners, I subsume, are less likely to attempt to catch a punt during inclement weather. Fumbles, turnovers, and TDs appear to be stochastic and independent of the influence of meteorological conditions. Muffs, however, do appear to increase when it is colder and windier but not in greater precipitation. It seems blocked punts are slightly less random as precipitation and wind increase but these are the rarest of rare events. Penalties are slightly more likely to occur as temperature increases but this is likely due to there being more punt returns in warmer weather. So, that covers meteorology and punting with a healthy dose of chart gluttony. 

Saturday, April 6, 2019

Summarizing factors that influence scoring in NBA Slam Dunk Contests


In late May 2017, a colleague and I finished writing a manuscript about a study of factors influencing scores in NBA Slam Dunk Contests (SDC). We submitted the manuscript for peer review, which means that several sports analytics experts read our manuscript, pointed out its weaknesses and provided insightful critiques. Peer reviewed enabled us to produce a better manuscript. A revised version of the manuscript was recently accepted for publication in the Journal of Sports Analytics. I will summarize the findings of the study in this post. 

Let us first review what the SDC is for unfamiliar readers. I’m assuming you know what a slam dunk is. Well, the SDC is a competition of who can do the ‘best’ dunks. The definition of ‘best’ is subject to interpretation. It may be more apt to say that the SDC is a competition of whose dunks get the highest scores. Scores are awarded by a panel of 5 judges who almost exclusively give scores on a scale of 0 to 10. All the judges’ scores are added together to get a total score, usually, 0 to 50, with higher scores meaning it is a ‘better’ dunk. One contestant will do a dunk and then that dunk is scored. Then the next contestant dunks and it is scored, and so on. The highest scoring contestants in a round move on to the next round. The highest scoring contestant in the final round is declared the winner.

We focused our study on three broad factors that should influence SDC scoring:
  • Dunk elements are things like where the contestant jumped from or what they did with the ball and their body while in the air.
  • Contestant formatting and rules. This is things like the order that contestants dunk in, replacing missed dunks, how many rounds there are, and experience of the judging panel.
  • Superlatives. Superlative factors include how popular a contestant is, having ‘home field advantage’, how unique a dunk is, and contestant height. Another is things contestants do to create excitement, usually before dunks. For example, Blake Griffin staging a singing choir before his dunk (in 2011) or JaVelle McGee having a second basket set up (also in 2011).
Compared to other judged competitions like gymnastics, there is neither an ‘official’ list of the different dunk elements that can be done nor gradings of difficulty for those elements. Indeed, there are common names for dunks like a free throw line dunk or a tomahawk. More complex dunks might have names like 360° windmill or 180° double-pump reverse. However, these names cannot easily be analyzed. This posed a problem.

So, we segmented the dunk into its elements. We split the dunk elements into [a] things that can only be done while possessing (controlling) the ball and [b] things that can be done with or without the ball. We called these primaries and modifiers, respectively. Primaries require possessing the ball and include well-known maneuvers like windmills, double-pumps, between-the-legs, and others. Two primaries can’t be done at the same time, but one can be done after another while still airborne. A neat loophole is that you could do two primaries at the same time if you’re using two balls. Modifier elements that can be done with or without the ball include things like spinning in the air, jumping from the free throw line, catching a pass, jumping over something, covering the eyes, and more. You can do multiple modifiers at the same time. Like, you can cover your eyes while doing a 360° or you can catch a pass while jumping over something. You can also do (multiple) modifiers while doing a primary, like spinning 360° and covering your eyes while doing a windmill.

Another way to think of the elements is to imagine the most basic dunk. You jump straight up and dunk the ball. Nothing else. This system of elements would call that a dunk with no primary and no modifiers. If, instead, you jumped straight up and did a 360°, the system would call that a dunk with no primary, with a 360° modifier.

With that said, we reviewed all the dunks in NBA SDCs from 1984-2016. Separately, we determined what elements each dunk contained. Twice. Our determinations were acceptably consistent. When they were inconsistent, we reviewed those dunks and came to an agreement on what elements were done. Of 682 dunks, 215 had no sourceable footage, no scores, or were missed dunks that judges scored. These couldn’t be analyzed. That left 467 that could be analyzed.


Figure 1a. 
Figure 1b.

First, we did some fancy math (i.e., logistic regression) to determine which elements are the hardest to do. (Actually, we used all 682 dunks for this part.) We used the likelihood that there would be an execution error when performing each element. Execution errors are things like misses, botched attempts, and replacement dunks. As can be seen in Figure 1, the dunk elements we would expect to be more difficult were also more likely to have errors (1a) and be classified as harder (1b).

Figure 2.

Using some even fancier math (i.e., nonparametric regression) we examined how dunk elements factored into scores. Figure 2 shows how dunk elements affect scores. The primaries are shown on the vertical axis and the modifiers shown on the horizontal axis, at the bottom. Next to each primary is a number. This is the (expected) average score for that primary when there are no modifiers. On the line next to each primary, each bar up or down represents how much the average score is changed by a modifier. Bars above the line means the scores goes up and bars below the line means a score goes done. A white line in a bar means +/- 2 points. We can see that the Basic primary, which means there was no primary, will get a low score when there are no modifiers. However, we then see that doing modifiers will increase the score for a Basic dunk with no primary. This makes sense. No primary and no modifier is the most inanimate dunk. Likewise, more difficult primaries like the between-the-legs are less affected by modifiers. This shows that both primaries and modifiers factor into SDC scores.

The dunk elements explain about 44% of why different scores are awarded for different dunks in NBA SDCs. But what about the other 56% of why dunks are scored they way they are? To figure that out, we again used some fancy math (i.e., linear mixed-modeling) to look at how contest formatting and superlatives affected scores. (Rather than use the actual dunk scores, though, we used the parts of scores that were not explained by dunk elements.)


Figure 3.
In Figure 3 we can see how contest formatting and superlative factors would affect an average dunk with the score of 45. That dunk in the initial round would be a 45, but in the middle and final rounds, it is expected to be closer to a 47. (We later show that this is due to lower scores for dunks by contestants eliminated in the initial round, so the 45 is really only for weaker dunkers.) Likewise, 3 botched attempts or 3 replacements is expected to lower the score of the dunk by about 1 point. Histrionics are things contestants do to create excitement, usually before the dunk. Although many uses of histrionics do not affect the execution of a dunk, histrionics increase scores by nearly 3 points! While we can’t tease apart if the most popular contestants are also the most athletic, popular guys like MJ and Vince are expected to get about a 2-point increase in the score—so the 45 becomes 47. If a 6’6” tall contestant and 5’7” contestant do the same dunk, the shorter contestant would be expected to get the 45 and the taller guy would get a 44. Figure 4 shows how dunking later in the order in the initial round greatly increases scores over dunking first or second in the order.
Figure 4.
Overall, the contest factors, superlatives, and dunk elements together explain about 72% of why different scores are awarded in NBA SDCs. This is a pretty good amount to explain considering that we are not saving lives or anything.

A main finding of our study was that scores go up when there is excitement surrounding a dunk. Popularity and ‘home court advantage’ yield higher scores. Scores are slightly reduced the more judges have judged and the more times a dunk has been done, which is perhaps a case of ‘show me something new’. Scores also go down when there are botched attempts or replacements—this might be a spoiler effect. Singing choirs and bouncing cheerleaders boost scores. We speculated that this was due to excitation transfer. For example, imagine you rated some potential dates on how sexy they are. Excitation transfer happens as you will likely rate them to be sexier after you get off a crazy roller ride than after you walk down a small hill. The roller coaster ride gets you excited, and your excitement is transferred to how you rate your potential dates. Judges get excited by choirs and cheerleaders and their excitement is transferred to scores.

But what does this study mean for dunkers? First—well, try not to go first in the initial round! You should try to do creditable dunks you can make on your first try. So, practice, practice, practice a set of respectable dunks so you can make them. You want these dunks to be effortless for you, like singing Twinkle, Twinkle Little Star. Anyone could sing it at a different tempo or in a loud bus station and never miss a word. You want to be able to do your go-to dunks on wood or blacktop, indoors or out, on a slightly higher rim, or in a crowded half-court space.

Likewise, the scores for the most common primary elements—windmill/cradle, double-pump, and between-the-legs—differ only by 1 or 2 points, between 44 and 47. So, although you can probably do that between-the-legs (see Aaron Gordon in 2017), doing a windmill or double-pump may get you a high enough score to progress to the next round or win. Doing things like catching a pass, spinning in the air, jumping over things, jumping from farther away, covering your eyes, and so on, will likely make up the 1- or 2-point difference between, say, a windmill and a between-the-legs. Lastly, get the crowd involved before you dunk. Do something to create some excitement. Dance. Show us how short you are compared to the height of the rim. Bring that kid with his dad in the third row out to throw you a pass.

Summarily, this is for the dunkers. The NBA SDC competitors of course, but more so for those elite athletes who excel at and train to dunk. They aren't in the NBA. Me? I’m 5’11 and I’m getting old. I can’t dunk anymore. My coauthor, who is the same height, but is several years younger, contends he can still dunk. Okay, maybe I could still dunk on a warm day when I’m well-rested and there's curvy Honduran women courtside wearing sundresses pretending not to watch. Even when I could dunk with ease though, I could never do what the dunkers do. But dunking saved my life. So, I've got to help move the sport forward in the ways that I can. One of my life goals is to be able to (almost) dunk at 40. I’ve got a few years before I get there. To 40, that is. A more important goal to me is seeing a Slam Dunk Contest in the Olympics. For the World to appreciate it. Unfortunately, that cannot happen with how winners are currently decided in dunk contests. In closing, this research was the first step of a much larger effort to legitimize dunk contests as a competition.