Saturday, December 3, 2016

Analysis of Collegiate Football Rivalries at the Aggregate Level

Rivalry games are an occasion for hate for college football fans and their silent, meticulous superstitions. Many hold a degree granted by the name on the front of the jersey, conferred upon them after four or five years of personal growth and diligent pupilage. It was during these years that many fans learned to hate whereas others, who may never have enrolled, will, throughout the life course, continually refine a hate acquired in early childhood. Not all rivalries are vicarious campaigns of hatred staged in the last week before post-season play. Not all rivalries are staged annually. Sometimes, members of the opposing student body are abducted and ransomed; occasionally, a festivity such as homecoming is borne some  existed prior to statehood;  yet others are simply old.

Perhaps meta-rivalries emerge when considering the most preeminent rivalries. Indeed, a Google search will reveal countless lists. A recent study concluded that the Auburn-Alabama football rivalry is the greatest in sports, based on a thorough analyses of troves of fanbases’ Twitter activities.  The conclusion seems reasonable and surely other empirical analyses exist but I am too lazy to perform a more thorough lit review.

Let us consider the characteristics of a rivalry based on how the matchups have played out because fans are wonderfully delusional. Data were extracted from the CFB SR Rivalry Finder and include only rivalries in which ≥50 matchups were available with both parties classified as major college teams. Included in the analyses were 116 rivalries from 1891 through 2015, a total of 9499 games.

The characteristics below are essential in my estimation. I intend to revisit this subject to analyze rivalries on a game-by-game basis but for this analysis, we will examine rivalries using summary statistics of the entire history of each rivalry.
  • Frequency: because absence makes the heart grow fonder
    • How often rivals play, we'll call this regularity, as measured by dividing the age of the rivalry (in years) by the quantity of matchups
      • Lesser values indicate greater frequency
    • The length of the longest span (of years) between matchups
      • Lesser values indicate greater frequency
  • Competitiveness: because there are no moral victories
    • How often a winning/losing streak was ended divided by the quantity of matchups, we'll call this streaks ending
      • Higher values indicate higher competitiveness
      • Team A winning one year and team B winning the next would produce two ended-streaks
    • The length of the longest streak
      • Smaller values indicate higher competitiveness
    • How close is the series overall, we'll refer to this as closeness and it is computed by (win% team A to-date)2 * (win% team B to-date)2 * 16;
      • Value will be ≤1 and ≥0.
      • Greater values indicate higher competitiveness
      • if each team won 50% of all games (i.e., no ties), this value will equal 1
      • Arithmetically, this value will be inflated for rivalries with fewer or no ties because imagine if the Big-10 declared a tie at the end of regulation in the OSU-UM game last week
  • Margin of Victory: because, for the fanatics there is always a thirst for additional goblets of enemy blood but for the enthusiasts and executives there is a preference for games decided in the fourth quarter. Indeed, there is likely collinearity between margin of victory and competitiveness but margin of victory provides additional nuance. For instance, BYU-Utah St. rivalry is 47-35-3 but 54.1% of the matchups have been blowouts—or 54.1% of matchups have been boring for unemotional audiences. Closeness ranks BYU-USU higher than the Colorado-Utah rivalry which is 29-28-3 although only 28.3% of the games have been blowouts.
    • How often matchups are close-scoring, as measured by the quantity of matchups in which the margin of victory is ≤ 7 points divided by total quantity of matchups
      • Greater value indicates higher competitiveness
    • How often matchups are blowouts, as measured by the quantity of matchups in which the margin of victory is ≥ 17 points divided by total quantity of matchups
      • Lesser values indicate higher competitiveness
    • How often matchups end in ties, as measured by the quantity of ties divided by the quantity of matchups
      • Greater values indicate more tie-games
      • Ties suggest a competitive game but have not occurred since 1995, thereby penalizing or rewarding rivalries with more games prior to 1996. Inherently, rivalries with more ties will be rewarded by several (likely collinear) variables described above. Moreover, winning is the only thing. Thusly, ties will be penalized slightly.
  • Home invasion: because magic moments are made winning in front of the fans but few spoils of battle are more savory than the arrangement of the muscles on the faces of enemies defeated in their own territory
    • How often matchups are won by the road team, as measured by the quantity of matchups in which the road team was victorious divided by the quantity of matchups not played at a neutral site
      • Greater values indicate more road wins
I debated including age (in years) and the quantity of matchups but decided against it. The inclusion/exclusion criteria of ≥50 matchups as major college teams excluded Kent St.-Akron or Harvard-Yale and would unfairly penalized included rivalries such as Cinci-Miami(OH) who had matched-up 120 times as of 2015 but only had 54 of those included in the analysis.

Once the foregoing values were obtained for each rivalry, the values of each variable were standardized using the entire sample of rivalries (mean=0, SD=1). A number of standardized scores were inverted to ensure those scores entered a purely additive equation positively; for instance, higher regularity values would yield a positive standardized score instead of the negative score yielded directly from the raw values. The percentage of tie games was inverted but the inversion was weight as to reduce the direct penalty to ties (i.e., standard score multiplied by -0.5). Finally, all standard scores were summed yielding a rivalry rating with a mean of 0 and a SD of 1 across the sample.


WVU-Penn St. came out as the lowest rated rivalry and, thankfully, it has been on hiatus since 1992. Notre Dame-Navy was rated only slightly less uninteresting. Despite an historical dearth of competitiveness in both, Auburn-Miss. St. and LSU-Miss. State are the most average rivalries. Florida St.-Miami was the highest rated rivalry. The rivalry was 29-29-0 until earlier in the 2016 season when FSU took control. Having played 58 games in 60 seasons, there has been no ties, no neutral sites, and 44.8% of matchups were decided by ≤7 points. Although 32.8 of games were decided by ≥17 points, visiting teams won 58.6% of matchups. The rivalry approached or surpassed 1 SD above average in all variables except for ending streaks, in which it was average; the rivalry has been streaky. Summarily, at the close of the post there is a massive table with the summary statistics (raw values) for the 116 rivalries included in the analysis. It is not sortable but it is searchable
indeed, we here at POTH bring our readership the foremost developments in information technology.

We should also consider how this rivalry rating is associated with the raw values used in its imputation and some others. Doing so provides some insight into rivalry ratings  This can be seen in the table below. I also included the proportions of games with total scores within each rivalry that were ≤25th or ≥75th percentiles of total game scores for 5 years before and after a game. All p-values < .05.


Variable r
History
    Matches 0.319
    Age -0.103
Frequency
    Longest Hiatus -0.378
    Age / Matchups -0.478
Competitiveness
    Closeness 0.634
    Longest Streak -0.662
    Streaks End % 0.706
Margin of Victory
    Margin of Victory ≤ 7pts 0.719
    Margin of Victory ≥ 17 -0.678
    Ties 0.181
Home Invasion
    Visiting Team Wins 0.369
Inferred Style of Play
    Total Pts ≤ 25th%ile 0.244
    Total Pts ≥ 75th%ile -0.178

 I made no a priori suggestions about the valence each variable should contribute statistically to the rivalry rating. Other than the adjustment to ties (and while excluding age and matchup quantity), the rating was computed by a simple summation of standardized scores. So, in the future, when I elect to revisit this topic and perform a game-level analysis, I have some prior notions on how the  model model be constructed and tested for validity.






History Frequency Competitiveness Margin of Victory
Rivalry Matchups First-Latest W-L-T Neutral Site% Max Hiatus Regularity Closeness Max Streak Streaks End% ≤ 7 % ≥ 17 % Road Team Win% Rating
1 Florida State-Miami (FL) 58 1955-2015 29-29-0 0.000 3 1.034 1.000 7 0.397 0.448 0.328 0.586 8.977
2 Auburn-Georgia 111 1902-2015 50-55-6 0.351 3 1.018 0.797 9 0.495 0.432 0.297 0.583 7.85
3 Texas-Oklahoma 100 1903-2015 53-42-5 0.950 6 1.120 0.793 8 0.440 0.46 0.32 0.6 7.015
4 Oregon State-Oregon 99 1916-2015 43-50-6 0.061 3 1.000 0.770 8 0.465 0.414 0.293 0.559 6.751
5 Florida-Miami (FL) 55 1938-2013 26-29-0 0.109 14 1.364 0.994 7 0.400 0.418 0.255 0.531 6.428
6 Cincinnati-Miami (OH) 54 1962-2015 26-27-1 0.019 1 0.981 0.927 11 0.444 0.444 0.333 0.453 6.243
7 Clemson-South Carolina 107 1909-2015 63-40-4 0.000 1 0.991 0.775 7 0.486 0.374 0.393 0.561 5.902
8 Michigan-Ohio State 99 1904-2015 47-48-4 0.000 14 1.121 0.848 7 0.525 0.384 0.273 0.485 5.716
9 UCLA-Southern California 85 1929-2015 30-48-7 0.000 6 1.012 0.636 8 0.494 0.459 0.329 0.541 5.425
10 Oregon-Washington 96 1916-2015 40-52-4 0.000 3 1.031 0.815 12 0.438 0.469 0.323 0.479 5.4
11 Kansas-Missouri 110 1901-2011 48-54-8 0.145 2 1.000 0.734 5 0.573 0.427 0.345 0.404 5.325
12 Washington-Washington State 96 1917-2015 62-30-4 0.000 3 1.021 0.652 8 0.510 0.406 0.333 0.49 5.236
13 Southern California-Stanford 91 1922-2015 57-31-3 0.011 4 1.022 0.728 12 0.352 0.396 0.308 0.622 5.22
14 Baylor-Texas Christian 103 1903-2015 46-52-5 0.097 11 1.087 0.813 8 0.524 0.398 0.369 0.538 5.201
15 Texas A&M-Texas Tech 69 1932-2011 36-32-1 0.188 10 1.145 0.937 6 0.449 0.449 0.377 0.446 5.114
16 Indiana-Purdue 112 1899-2015 40-66-6 0.018 3 1.036 0.709 10 0.420 0.464 0.348 0.536 5.078
17 Toledo-Bowling Green State 54 1962-2015 27-26-1 0.000 1 0.981 0.927 7 0.463 0.352 0.315 0.389 4.981
18 Kentucky-Vanderbilt 82 1916-2015 42-36-4 0.000 9 1.207 0.809 7 0.463 0.439 0.293 0.463 4.909
19 Georgia-Georgia Tech 105 1902-2015 61-39-5 0.000 9 1.076 0.745 8 0.381 0.438 0.295 0.514 4.491
20 North Carolina-North Carolina State 95 1902-2015 57-34-4 0.021 15 1.189 0.738 9 0.411 0.453 0.305 0.548 4.409
21 Duke-North Carolina 95 1922-2015 33-58-4 0.000 1 0.979 0.720 13 0.379 0.453 0.347 0.526 4.403
22 Illinois-Northwestern 109 1892-2015 55-49-5 0.009 7 1.128 0.823 7 0.486 0.413 0.413 0.509 4.358
23 Louisiana State-Mississippi 99 1902-2015 57-38-4 0.010 5 1.141 0.781 8 0.485 0.424 0.364 0.459 4.347
24 California-Stanford 95 1918-2015 39-50-6 0.000 4 1.021 0.747 7 0.453 0.421 0.379 0.505 4.344
25 Iowa State-Kansas 94 1916-2015 39-49-6 0.000 3 1.053 0.748 7 0.521 0.351 0.34 0.479 4.326
26 Iowa State-Missouri 103 1908-2011 33-61-9 0.000 2 1.000 0.576 10 0.466 0.398 0.282 0.524 4.203
27 Army-Navy 112 1891-2015 49-56-7 0.804 7 1.107 0.766 14 0.491 0.438 0.277 0.455 4.169
28 Ohio-Miami (OH) 54 1962-2015 24-29-1 0.000 1 0.981 0.912 6 0.389 0.389 0.407 0.444 4.119
29 Arkansas-Mississippi 62 1908-2015 33-28-1 0.242 11 1.726 0.924 6 0.500 0.435 0.323 0.426 3.833
30 Georgia-Florida 94 1904-2015 49-43-2 0.872 11 1.181 0.910 7 0.415 0.394 0.404 0.5 3.496
31 Illinois-Purdue 90 1892-2015 43-41-6 0.000 12 1.367 0.758 6 0.511 0.4 0.389 0.544 3.395
32 Florida-Florida State 60 1958-2015 34-24-2 0.033 1 0.950 0.822 9 0.383 0.367 0.35 0.448 3.379
33 Wisconsin-Minnesota 123 1892-2015 58-57-8 0.000 2 1.000 0.764 12 0.455 0.431 0.358 0.439 3.278
34 New Mexico State-Texas-El Paso 76 1935-2015 26-49-1 0.000 4 1.053 0.778 8 0.382 0.461 0.355 0.368 3.198
35 Rice-Southern Methodist 90 1916-2012 41-48-1 0.000 5 1.067 0.944 10 0.422 0.333 0.344 0.411 3.069
36 Memphis-Southern Mississippi 50 1963-2012 17-32-1 0.000 2 0.980 0.758 7 0.380 0.42 0.38 0.4 3.03
37 Iowa-Wisconsin 88 1906-2015 42-44-2 0.000 7 1.239 0.911 10 0.443 0.352 0.352 0.443 2.787
38 Alabama-Louisiana State 78 1902-2015 49-24-5 0.013 14 1.449 0.598 11 0.474 0.397 0.346 0.623 2.565
39 Mississippi-Mississippi State 108 1902-2015 61-42-5 0.324 4 1.046 0.772 12 0.472 0.361 0.444 0.507 2.282
40 Notre Dame-Southern California 87 1926-2015 46-36-5 0.000 4 1.023 0.766 11 0.414 0.402 0.368 0.437 2.133
41 Arizona-Arizona State 79 1931-2015 39-39-1 0.000 4 1.063 0.950 11 0.392 0.316 0.405 0.456 2.131
42 Syracuse-West Virginia 60 1945-2012 33-27-0 0.033 9 1.117 0.980 8 0.433 0.317 0.4 0.397 2.088
43 Baylor-Texas A&M 102 1903-2011 29-64-9 0.010 5 1.059 0.509 13 0.441 0.451 0.333 0.495 2.023
44 Virginia Tech-Virginia 92 1902-2015 55-32-5 0.250 18 1.228 0.692 12 0.478 0.435 0.359 0.493 1.865
45 Auburn-Florida 84 1904-2011 44-38-2 0.024 10 1.274 0.898 7 0.488 0.393 0.357 0.305 1.747
46 Texas A&M-Texas 109 1903-2011 36-69-4 0.046 4 0.991 0.699 10 0.486 0.349 0.376 0.385 1.71
47 Arkansas-Texas A&M 71 1910-2015 40-28-3 0.099 18 1.479 0.790 9 0.479 0.423 0.366 0.469 1.642
48 Notre Dame-Michigan State 63 1918-2013 34-28-1 0.000 30 1.508 0.921 8 0.429 0.413 0.349 0.492 1.567
49 Southern Methodist-Texas Christian 93 1916-2015 38-48-7 0.011 4 1.065 0.712 15 0.376 0.376 0.344 0.533 1.555
50 Colorado State-Wyoming 101 1905-2015 52-45-4 0.000 5 1.089 0.842 10 0.426 0.297 0.416 0.495 1.505
51 New Mexico-Utah 52 1939-2010 17-33-2 0.019 15 1.365 0.689 5 0.462 0.442 0.404 0.412 1.158
52 North Carolina-Virginia 110 1902-2015 61-45-4 0.018 3 1.027 0.823 9 0.382 0.391 0.436 0.398 1.097
53 Auburn-Tennessee 51 1929-2013 27-21-3 0.039 17 1.647 0.760 6 0.471 0.471 0.333 0.388 0.978
54 Air Force-Colorado State 54 1957-2015 32-21-1 0.000 3 1.074 0.850 7 0.407 0.296 0.5 0.463 0.759
55 Maryland-Virginia 78 1919-2013 44-32-2 0.026 12 1.205 0.857 16 0.385 0.333 0.346 0.487 0.668
56 Arkansas-Texas 75 1906-2014 22-53-0 0.040 9 1.440 0.687 12 0.480 0.36 0.453 0.486 0.588
57 North Carolina-South Carolina 55 1903-2015 32-19-4 0.018 16 2.036 0.646 5 0.491 0.418 0.291 0.481 0.525
58 Alabama-Tennessee 97 1903-2015 53-37-7 0.000 14 1.155 0.695 11 0.381 0.402 0.371 0.505 0.487
59 Kansas-Kansas State 104 1912-2015 57-42-5 0.000 1 0.990 0.784 11 0.394 0.337 0.481 0.471 0.411
60 Southern California-California 96 1922-2015 67-25-4 0.000 2 0.969 0.529 13 0.333 0.323 0.375 0.531 0.264
61 Michigan-Michigan State 90 1918-2015 53-33-4 0.000 7 1.078 0.746 10 0.389 0.289 0.356 0.444 0.158
62 Mississippi State-Auburn 88 1910-2015 27-58-3 0.000 8 1.193 0.654 11 0.307 0.42 0.364 0.443 0.009
63 Mississippi State-Louisiana State 106 1902-2015 34-69-3 0.000 4 1.066 0.698 14 0.358 0.377 0.396 0.425 -0.027
64 California-UCLA 86 1933-2015 32-53-1 0.000 1 0.953 0.841 18 0.395 0.302 0.43 0.419 -0.109
65 Oklahoma-Nebraska 86 1912-2010 45-38-3 0.035 7 1.140 0.855 16 0.360 0.372 0.419 0.446 -0.267
66 Pittsburgh-West Virginia 93 1909-2011 55-35-3 0.000 4 1.097 0.793 15 0.419 0.323 0.462 0.462 -0.29
67 Iowa-Minnesota 108 1901-2015 45-61-2 0.000 4 1.056 0.886 11 0.389 0.315 0.472 0.38 -0.635
68 Brigham Young-Utah 90 1922-2015 30-56-4 0.011 4 1.033 0.688 9 0.322 0.4 0.467 0.427 -0.641
69 Arkansas-Louisiana State 60 1906-2015 21-37-2 0.433 26 1.817 0.745 7 0.450 0.417 0.317 0.412 -0.732
70 Baylor-Texas 104 1903-2015 25-74-5 0.010 4 1.077 0.468 16 0.462 0.356 0.413 0.437 -1.125
71 Baylor-Texas Tech 72 1932-2015 34-37-1 0.111 8 1.153 0.942 15 0.347 0.347 0.444 0.391 -1.128
72 Arizona-New Mexico 50 1931-2015 31-17-2 0.020 10 1.680 0.711 10 0.340 0.34 0.34 0.49 -1.267
73 North Carolina-Wake Forest 101 1908-2015 67-32-2 0.050 4 1.059 0.707 16 0.366 0.356 0.455 0.417 -1.343
74 Brigham Young-Utah State 85 1922-2015 47-35-3 0.035 4 1.094 0.829 10 0.388 0.306 0.541 0.451 -1.346
75 Penn State-Pittsburgh 89 1905-2000 43-42-4 0.000 16 1.067 0.832 14 0.416 0.348 0.438 0.416 -1.364
76 Alabama-Auburn 74 1902-2015 43-30-1 0.676 41 1.527 0.888 9 0.446 0.392 0.365 0.417 -1.48
77 Wisconsin-Ohio State 80 1913-2014 17-58-5 0.013 8 1.263 0.380 21 0.413 0.413 0.325 0.481 -1.611
78 Missouri-Oklahoma 93 1912-2011 22-66-5 0.022 3 1.065 0.451 14 0.430 0.301 0.409 0.462 -1.689
79 Pittsburgh-Syracuse 71 1916-2015 37-31-3 0.028 25 1.394 0.828 11 0.352 0.394 0.38 0.435 -1.865
80 New Mexico-New Mexico State 82 1931-2015 59-21-2 0.000 4 1.024 0.543 18 0.305 0.39 0.427 0.451 -1.908
81 Notre Dame-Purdue 83 1899-2014 56-25-2 0.024 13 1.386 0.661 11 0.434 0.301 0.458 0.469 -2.07
82 Georgia-South Carolina 63 1903-2015 44-17-2 0.000 17 1.778 0.568 10 0.397 0.397 0.365 0.46 -2.12
83 Oklahoma State-Tulsa 58 1917-2011 34-22-2 0.000 16 1.621 0.791 5 0.466 0.293 0.328 0.293 -2.223
84 Colorado-Utah 60 1905-2015 29-28-3 0.000 49 1.833 0.814 9 0.383 0.467 0.283 0.467 -2.285
85 Utah-Utah State 98 1912-2015 68-26-4 0.000 3 1.051 0.542 12 0.388 0.296 0.439 0.418 -2.303
86 Penn State-Syracuse 71 1922-2013 43-23-5 0.042 18 1.282 0.616 16 0.352 0.408 0.366 0.471 -2.415
87 Texas-Texas Tech 64 1934-2015 48-16-0 0.000 8 1.266 0.563 8 0.375 0.281 0.453 0.406 -2.666
88 Virginia Tech-West Virginia 50 1915-2005 21-28-1 0.060 35 1.800 0.885 7 0.440 0.3 0.32 0.404 -2.683
89 Kent State-Bowling Green State 53 1962-2015 9-44-0 0.019 2 1.000 0.318 14 0.264 0.264 0.396 0.5 -2.939
90 Maryland-West Virginia 52 1919-2015 22-28-2 0.038 24 1.846 0.830 7 0.404 0.327 0.462 0.5 -3.1
91 Tennessee-Vanderbilt 105 1902-2015 75-26-4 0.000 3 1.076 0.501 22 0.286 0.371 0.429 0.486 -3.257
92 Mississippi-Vanderbilt 86 1902-2015 49-35-2 0.105 9 1.314 0.860 15 0.314 0.302 0.419 0.39 -3.26
93 Missouri-Nebraska 95 1901-2010 33-59-3 0.011 9 1.147 0.745 24 0.316 0.347 0.389 0.426 -3.304
94 Tennessee-Kentucky 99 1915-2015 73-18-8 0.000 3 1.010 0.288 26 0.333 0.394 0.374 0.515 -3.564
95 Mississippi-Tulane 62 1902-2012 39-23-0 0.000 10 1.774 0.871 11 0.290 0.29 0.419 0.403 -4.01
96 Air Force-Army 50 1959-2015 35-14-1 0.040 4 1.120 0.615 8 0.480 0.2 0.54 0.333 -4.216
97 Georgia Tech-Clemson 79 1902-2015 50-27-2 0.013 10 1.430 0.749 15 0.380 0.38 0.418 0.256 -4.426
98 Texas A&M-Texas Christian 89 1903-2001 54-28-7 0.056 6 1.101 0.583 24 0.315 0.326 0.427 0.5 -4.536
99 Michigan-Minnesota 102 1892-2015 74-25-3 0.000 9 1.206 0.506 16 0.294 0.314 0.48 0.49 -4.59
100 Texas Christian-Texas 82 1904-2015 23-58-1 0.000 12 1.354 0.630 24 0.378 0.268 0.476 0.512 -4.85
101 Iowa-Iowa State 60 1899-2015 41-19-0 0.000 43 1.933 0.749 15 0.367 0.367 0.367 0.483 -5.059
102 Utah State-Wyoming 65 1912-2015 36-25-4 0.000 23 1.585 0.726 10 0.415 0.277 0.431 0.415 -5.217
103 Clemson-Georgia 59 1902-2014 16-39-4 0.119 10 1.898 0.514 10 0.441 0.356 0.407 0.346 -5.426
104 Oklahoma State-Oklahoma 102 1914-2015 18-77-7 0.069 1 0.990 0.284 19 0.343 0.284 0.529 0.537 -5.429
105 Indiana-Michigan State 61 1927-2015 13-46-2 0.000 12 1.443 0.413 8 0.393 0.262 0.541 0.459 -5.815
106 Louisiana State-Tulane 90 1904-2009 63-20-7 0.000 7 1.167 0.387 18 0.378 0.244 0.478 0.489 -6.276
107 Nebraska-Minnesota 56 1900-2015 23-31-2 0.000 21 2.054 0.827 16 0.321 0.357 0.464 0.446 -6.563
108 Georgia-Vanderbilt 72 1903-2015 53-17-2 0.000 20 1.556 0.483 11 0.333 0.264 0.486 0.486 -6.598
109 Alabama-Mississippi State 96 1902-2015 77-16-3 0.000 5 1.177 0.286 22 0.229 0.333 0.427 0.438 -6.986
110 Wisconsin-Michigan 64 1892-2010 13-50-1 0.031 16 1.844 0.403 14 0.313 0.281 0.422 0.468 -7.21
111 Nebraska-Colorado 68 1902-2010 48-18-2 0.000 41 1.588 0.559 18 0.368 0.25 0.412 0.471 -8.6
112 Kentucky-Florida 66 1917-2015 16-50-0 0.000 11 1.485 0.540 30 0.258 0.288 0.5 0.409 -9.818
113 Georgia Tech-Tulane 50 1916-2015 37-13-0 0.000 32 1.980 0.592 14 0.240 0.26 0.44 0.4 -10.637
114 Washington State-Idaho 63 1917-2013 53-8-2 0.063 24 1.524 0.183 21 0.190 0.254 0.444 0.508 -11.521
115 Navy-Notre Dame 89 1927-2015 12-76-1 0.360 1 0.989 0.212 43 0.202 0.281 0.562 0.404 -13.236
116 West Virginia-Penn State 55 1909-1992 9-44-2 0.018 14 1.509 0.274 25 0.273 0.236 0.545 0.389 -13.283


Saturday, October 22, 2016

Hand Size, Arm Length, and Drop Rate among NFL Receivers

Although it is not evident here, I have been exploring the influence of physical attributes on in-game performance in gridiron football. Physical attributes include anthropometrics such as height and weight as well as assessments of athleticism such as the 40-yard dash and standing vertical jump. In this post I focus on hand size and drop rates in NFL receivers. By receivers I refer broadly to any eligible pass catcher; specifically, WRs, TEs, and RBs. 

At least one other author has examined the influence of hand size on WR performance. Joe Redemann at numberFire explored relationships between hand size, several standard statistics, and “a player’s contribution to his team”, NEP, across multiple seasons. A mild relationship between hand size and catch rate (receptions / targets rate) was discernible (r = .17). Ultimately, Redemann demonstrated that the performance of elite WRs was relatively unrelated to hand size whereas a stronger—but still mild—relationship evidenced between the performance of ordinary WRs and hand size. 


We should expect then to find that larger hands predict in fewer drops. We should also expect a minimal reduction in drops offered by larger hands. We shall extend the analysis by including TEs and RBs in addition to WRs. 


NFL Combine data were procured from NFL Savant and passes dropped were extracted from Sporting Charts. Drop data were gathered for WRs, TEs, and RBs from the 2009-15 seasons. Combining Combine and drop data yielded 604 player-seasons from 238 players without checking for spelling inconsistencies between the two data sets. As regular readers know, I am lazy, so I pressed onward with this sample. Player height and arm length were included because of my intuition that doing so would better isolate the effect, if any, of hand size by isolating the effect of atypically large hands given medium or small stature. 


Table 1. Targets, Drops, and Anthropometrics for NFL WRs, TEs, and RBs, 2009-15
Totals Medians
Seasons ≥ Median Targets Players Handsize Arm Length Height Drop % Targets
RB 123 65 63 9.25 31 71 0.051 28
TE 103 52 42 9.875 33.125 77 0.046 35
WR 378 194 133 9.25 32.1 73 0.043 67
TOTAL 604 311 238
MEDIAN 9.375 32 73 0.045 53

Table 1 displays basic information about the players within our sample. Medially, TEs presented with the largest hands and, for the included seasons, WRs dropped fewer passes while being targeted more frequently. Thus, position should be accounted for in the analysis. Also, we will need to account for season to season variability within players. 


I constructed several models. For each model, the count of drops for each player-season was the dependent variable (actually, the rate of drops, or, drops / targets). Hand size, arm length, height, and position were predictor, independent, or explanatory variables—whichever jargon you prefer. Players were included as a random effect to account for each player’s variability in drops between seasons. Random effect also means that the models are constructed accounting for the differences between players that we are not measuring in this analysis; for instance, a factor as relevant but complex as coverages faced or as esoteric as blood type. 


Three models were constructed. Excluded here, there is a table at the bottom of the page with pertinent model data.

  • The first model included all 604 player seasons. In short, increased hand size predicted fewer drops; being a WR decreased drops.
  • The second model included 311 player seasons from 127 players who accumulated ≥ median targets at their positions. Again, larger hands and being WRs predicted fewer drops. Interestingly, greater arm length predicted greater drops.
  • In the third model, I included all 604 player seasons while accounting for whether ≥ median targets were accumulated in a given player-season (via dummy coding). Again, being a WR and having larger hands predicted fewer drops.
Figure 1. Expected Drops by hand size for NFL receivers.
For convenience, the p-values for hand size in models 1, 2, and 3 were .07, .05, and .10, respectively. So, the statistical significance of hand size in explaining drop rates hovers slightly above the conventional cutoff of .05.  But how does hand size actually impact drops? Probably minimally, as can be seen in Figure 1. Arm length and height held constant, for WRs, 3 inches of greater hand size equates to about 1 less drop per season; for RBs and TEs 3 inches of greater hand size equates to about half a drop per season.

Why though would greater arm length predict more dropped passes in the group of receivers targeted at or above the median? It might be a spurious finding, but, perhaps players with longer arms drop errant passes at the fringes of their catch radii, passes beyond the range of players with shorter arms. 


Alternatively, RBs are known to exhibit the highest drop rate. Likewise, consider that RBs are often catching passes coming out of the backfield where more defenders may be lurking; perhaps RBs are targeted with more less accurate passes due to their being a check down or hot receiver targeted when the QB is hurried; or by releasing from his blocking assignment into the flats, knowing a pass is coming, and having to reorient his body to catch. The process of reorienting with multiple proximal defenders might be more difficult for larger RBs, with longer arms. Unsurprisingly then, within each position, only for RBs is the raw correlation between drop rate and arm length appreciable—although this may be the result of small sample sizes:

  • WR: r = .014, p = .805, n = 315
  • TE: r = .133, p = .311, n = 60
  • RB: r = .369, p = .004, n = 58
Summarily, the effects of hand size and subsequently arm length on drop rates in NFL WRs, TEs, and RBs were examined in this analysis. Although the effect of hand size was in a significant range, the ecological impact of hand size is irresolute because, with all else held constant, 1 inch of hand size was worth about 1/3 of a drop per season for WRs. Additionally, if you are managing NFL offensive personnel and you rely on RBs in your passing game, the present results suggest it may be advisable to cultivate a smaller RB with shorter arms to specialize in pass catching.



Technical supplement. For GLMMs 1-3
Variable Fixed-Effects
Estimate SE z p
Model 1: AIC = 2216.4 Player Var & SD = .039, 0.197
Intercept -2.940 0.085 -34.400 0.000
Hand -0.065 0.036 -1.790 0.073
Height -0.002 0.056 -0.030 0.978
Arm 0.067 0.051 1.330 0.185
TE -0.113 0.142 -0.800 0.425
WR -0.172 0.093 -1.840 0.066
Model 2: AIC = 1298 Player VAR & SD = .037, 0.193
Intercept -2.974 0.101 -29.350 0.000
Hand -0.0840 0.0422 -1.990 0.047
Height 0.112 0.064 1.760 0.079
Arm 0.028 0.067 0.420 0.674
TE -0.239 0.166 -1.440 0.149
WR -0.199 0.111 -1.790 0.074
Model 3: AIC = 2207.2 Player Var & SD = .039, 0.197
Intercept -2.786 0.094 -29.570 0.000
Hand -0.058 0.035 -1.670 0.095
Height 0.070 0.049 1.430 0.153
Arm 0.003 0.054 0.050 0.960
≥ Median Targets -0.199 0.059 -3.410 0.001
TE -0.131 0.137 -0.960 0.340
WR -0.189 0.091 -2.090 0.036
Note: R v3.3.1 console used for analysis. The glmer function within the lme4 package was employed in the GLMM. A Poisson distribution was specified; a negative binomial distribution yielded essentially identical results but the models did not converge. Data and R code available upon request, of course.






Sunday, July 17, 2016

Field Position Part II


In a previous post I discussed how INTs and INT return yardage influenced starting field position (SFP). I will extend that discussion to include each of the other events that directly result in SFP: turnover-fumble returns, kick and punt returns, and missed field goals by opponents. As an aspiring defensive back, I of course took great care discussing interceptions. I will devote little discussion here to fumble recoveries and missed field goals. I will harp on kick-off returns but refrain from discussing punt returns at any depth.

Let me first state that my play-by-play (PBP) data differs slightly from the official record. I excluded yardage gained on returns for TDs in the analysis because a TD precludes SFP. Excluded also was return yardage gained prior to a turnover-fumble.

Concerning INTs, I emphasized that ending opponents’ possessions is most salient and that INT return yardage is a somewhat superfluous stat. INT return yards may be useful to compare playmaking abilities between DBs, although statisticians, teams, and observers might be better served knowing the SFP that resulted from an interception. This notion is definitely applicable for fumble returns where, again, the ending of opponents’ possession is most salient.

Likewise, it is also relevant for rating punt returners. For instance, a player fair catching a punt at his own 9-yard line would be recorded as a fairly unremarkable zero yards (i.e., it is counted in his average PRY). However, the fair catch was probably initiated in the presence of proximal defenders who could have disrupted the impetus of the punted ball at say, the 2-yard line had the returner declined to fair catch. Thus, by fair catching—despite accruing zero yards—the returner in the example would improve his team’s SFP by 7 yards (of course, the defense downing the ball is hypothetical).

The foregoing notion of field position in lieu of yardage is applicable to kick returns as well. For example, let us review the 2014 NFLleading kick-returners by average yards per return. I have Bruce Ellington of the 49ers at 24 returns for 25.9 yards per return;c.f. he ranks about ninth in KR yards. However, Ellington gives his offensive teammates an average starting FP at the ~23-yard line—18th on my list of qualifying players. It may be poor decision making on his behalf or poor block execution behalf of his teammates or that he generally fields kickoffs from superior kickers but we must acknowledge Ellington’s average catch-spot (CS) on KRs was nearly 3-yards into the endzone, ranking third-deepest on my list of qualifying players.1

Although this post is about SFP, the above anecdotes underscore the entanglement of variables involved in appraising performances with yardage accrued. However, Ellington still gained those yards. If we are comparing players (or even coverage units), perhaps, Ellington does rank ninth in KR yards. However, football is about team success and on a given drive, a team is increasingly inclined to success the closer it begins to its opponent’s endzone. Conversely, Ellington’s team did start 3 yards closer to the endzone then would result from him taking more touchbacks.

Moving on, for all teams in the 2014-15 NFL season, I obtained all non-TD turnover-fumble returns, interceptions, kick and punt returns, and field goals missed by opponents using the Pro-Football Reference PBP searchtool. Opponents’ missed FGs include blocks but excludes blocks returned for TDs. For all plays except opponents’ missed field goals, I extracted [a] the spot of the INT, fumble recovery, or catch and [b] the spot at which the player was downed following the return. Computed with those values were [c] return yards or 20 for a touchback and [d] the SFP of the player’s offensive teammates. SFP was scaled such that teams’ own goal lines equaled zero and opponents’ goal lines equaled 100; greater yards indicate better SFP.



Table 1. Counts, Average SFP, and Average Return Yards for Events Resulting in SFP, NFL 2014-15
TEAM TOTAL EVENT COUNTS AVERAGE STARTING FIELD POSITION BY EVENT AVERAGE RETURN YARDS BY EVENT
KR PR FR INT oMFG SFP KR PR FR INT oMFG KR PR FR INT
KAN 68 76 5 5 5 29.3 25.4 28.7 22.4 44.0 23.6 25.4 8.6 0.0 15.2
CIN 78 74 5 19 5 30.3 24.7 30.3 27.4 50.8 26.0 24.9 8.4 0.0 9.2
NWE 68 64 7 16 5 30.6 22.7 32.0 36.9 51.8 29.6 22.1 7.5 0.4 11.7
DAL 75 66 12 16 2 28.9 21.1 26.9 35.5 43.3 21.0 22.3 7.3 2.6 9.3
TAM 81 63 11 11 7 26.6 20.7 25.3 30.1 48.8 32.6 21.6 7.2 1.3 6.9
IND 79 88 13 11 4 28.7 22.5 28.4 30.3 43.2 24.0 24.3 7.0 2.6 8.7
BAL 68 73 12 10 6 28.7 23.3 30.4 32.9 48.2 27.8 22.9 6.9 4.3 9.1
JAX 92 74 12 5 4 25.7 22.1 21.5 24.3 51.0 31.8 21.9 6.4 0.8 13.0
PHI 83 85 16 9 5 30.0 22.9 28.8 33.8 41.6 23.4 20.9 6.2 6.3 6.2
MIN 73 74 4 11 6 27.3 25.0 25.5 15.3 49.0 30.3 21.9 6.2 0.0 8.2
STL 79 74 11 10 1 28.0 22.1 28.8 32.4 42.1 35.0 22.9 5.9 3.3 10.8
BUF 71 86 8 18 7 30.2 21.3 27.0 25.5 60.4 26.9 20.6 5.9 4.4 19.2
OAK 94 81 4 9 5 24.2 19.9 24.7 26.0 61.9 24.2 21.4 5.7 5.3 8.4
CHI 97 49 8 13 6 25.9 21.1 27.0 15.9 43.6 23.8 20.5 5.6 2.4 10.9
SDG 81 66 8 6 2 26.4 21.3 26.5 24.1 31.7 33.5 21.1 5.5 0.0 12.0
SFO 70 74 5 21 0 27.8 22.6 28.3 16.2 48.0 - 22.9 5.5 0.0 18.8
ATL 89 55 7 15 4 26.5 22.4 25.7 32.3 40.5 27.5 22.5 5.4 5.6 6.9
MIA 82 57 10 11 6 31.1 24.2 25.5 21.3 50.6 23.5 23.9 5.3 0.0 17.1
DEN 75 84 5 16 5 28.9 22.6 28.2 26.0 54.4 25.2 21.4 5.3 0.4 10.8
TEN 89 72 6 11 5 25.8 23.2 24.8 38.8 52.7 31.0 22.5 5.2 7.2 10.8
ARI 76 77 5 15 4 26.9 19.6 24.8 20.8 51.3 32.5 20.2 5.1 1.8 10.3
NYJ 85 79 7 6 5 27.8 22.5 26.6 31.4 35.0 24.0 22.1 5.1 0.3 9.0
PIT 86 66 10 7 2 25.7 20.7 24.9 32.5 48.9 29.5 21.1 5.0 3.9 18.1
NYG 87 74 9 16 2 28.2 20.7 23.7 31.6 62.1 24.0 21.1 4.9 1.5 16.6
GNB 79 60 7 15 1 28.5 20.1 27.0 37.4 54.7 29.0 20.3 4.8 0.0 15.2
CAR 83 69 13 10 4 27.7 21.8 25.5 32.5 45.2 25.8 21.0 4.5 2.8 19.0
SEA 62 81 9 11 2 30.5 22.4 27.7 29.0 58.2 32.5 21.4 4.2 0.0 14.2
CLE 72 83 7 18 3 26.8 22.6 24.9 27.7 54.1 34.0 22.8 4.2 4.9 14.4
WAS 85 80 9 6 3 25.1 21.4 22.7 29.0 45.5 18.0 20.8 4.0 2.1 5.0
HOU 71 82 10 16 2 27.7 20.4 23.5 31.9 60.6 26.5 20.7 3.8 8.7 16.6
DET 70 81 7 18 4 29.9 21.1 29.4 32.9 59.2 27.3 21.8 3.8 1.8 18.7
NOR 86 62 6 12 0 25.5 22.2 22.0 18.5 42.9 - 22.3 3.0 0.0 12.5
League Event Counts Average Field Position by Event Average Return Yards by Event
AVG 79 73 8 12 4 AVG 27.9 22.0 26.5 29.1 50.5 27.2 AVG 22.0 5.6 2.3 12.3
SD 9 10 3 4 2 SD 1.8 1.4 2.5 6.4 7.6 4.2 SD 1.3 1.3 2.4 4.2

Table 1 contains 2014-15 distributions, NFL team average SFP and yards gained for each event, and League averages thereof. KRY and PRY are computed with touchbacks equal to 20 yards and no return equal to zero yards. Neither New Orleans’ nor San Francisco’s opponents missed FGs, apparently. There is nothing particularly noteworthy in the table, otherwise.

I also can tell you several things. INTs have the largest impact on the next-SFP when statistically controlling for the initial play spot, the spot at which an INT, fumble recovery, or kick/punt catch occurred, and the yardage gained on the return.2 I can also tell you that for all NFL teams, the majority of SFP yardage is derived from either KR yards or PR yards. Table 2 provides some insight into why this is.



Table 2. Characteristics of NFL Based on Majority of SFP
Majority of Team SFP From
VARIABLE KR PR
Teams Count 11 21
avg SFP 27 28
avg SFP Unproductive Drives 24 24
avg KR-SFP 22 22
avg Unproductive Drive Yards 17 16
avg Punt Yards 45 45
Opp avg Punt Return Yards 9 9
avg Def. SFP After Unproductive Drive 24 23
Opp avg Unproductive Drive Yards 16 16
Opp avg Punt Yards 45 45
avg Punt Return Yards 5 6
% All Drives Turnovers 14% 11%
Opp % All Drives Turnovers 12% 12%
% All Drives End w/ Score 32% 35%
Opp % All Drives End w/ Score 39% 32%
win% 35% 58%
NOTE: Unproductive drives are defined as those that end without a score.
Scoring drives are those that ended in TDs or FGs.


In Table 2 we see that the two types of teams perform similarly in most situations. Notably, teams whose majority of SFP is derived from KRs commit TOs slightly more frequently. As an aside, this might suggest that while essentially random, a modicum of TOs may be attributable to offensive ineptitude (albeit, in single season sample). Those teams’ opponents also end drives by scoring considerably more frequently—23% more—than teams whose majority of SFP is derived from PRs. The PR-teams score slightly more frequently.

Most striking in Table 2, though, is the disparity in win percentage. The KR-teams can be expected to win 5.6 games whereas PR-teams can be expected to win 9.3 games. Thus, I conclude that, despite the indelible impact of Devon Hester or the ’84 Seahawks’ 3-4 monster, ultimately, SFP is largely the result of an ungenerous defense supplemented by relatively consistent and careful offensive play.

Summarily, the impact of various events on starting field position was examined using data from the 2014-15 NFL season. Although INT yards are most impactful on SFP in isolation, when statistically controlling for event-spot and return yardage, the majority of SFP is derived from either KR or PR yards. Likewise, winning teams garner most of their from PR yards. I concluded that this effect is likely due to defensive stops and consistent, careful offensive play.



1 Minimum 1 KR per game scheduled.
2 To accomplish this, SFP was regressed on to play start spot, event spot, and yards gained. The residuals were saved. An ANOVA was performed with those residuals as the dependent variable and event type as the independent variable. A significant effect of event type was found, F(4, 5641) = 17.422, p < .001. Roughly, planned post hoc comparisons indicate the effect of event on SFP could be ranked as INT > FUM > PR > MFG > KR.