Bill Connelly invented all of this; check out his college football analytics blog, Football Study Hall.
By this time next week we will all have made our final farewells to summer by watching our beloved Crimson Tide take the field to start the 2015 season. Labor Day barbecues will be underway, celebrating the end of the long, dry spell between the NBA Finals and college football via consumption of cooked meats and adult libations. More important than both of these events, Charting the Tide will return here at Roll ‘Bama Roll starting next Monday. Since CTT began last season as a bye week fill-in for Processing the Numbers, and thus never got a primer or proper introduction, I thought I might address that oversight today. Also, one of my offseason projects was to rethink/expand the analysis I do with the charting and play-by-play data, and I’ll be highlighting what those changes are today as well.
1 | Not a fan of baseball. Unsurprisingly, huge fan of sabermetrics!
2 | Assuming you partake of both, that is.
Sounds great. Why should I care?
Outstanding question! In order to answer this, let’s consider the content of your typical college football box score. You get play counts, total yardage gained, maybe yards per attempt (depending on the service), all broken out by passing, rushing, and receiving. Similar statistics are provided for the special teams as appropriate, and the quarterbacks usually get completion percentage and some sort of passer rating. Of course you get points scored, turnovers, penalties, time of possession, first downs, maybe third/fourth down conversion rates as well. This is all good, useful stuff, and you can tell a lot about what happened in that game based on this information.
3 | For example, why did Ohio State beat ‘Bama last year? The biggest reason is they were 10-18 on third down versus 2-13 for the Tide.
Now, let me ask you this: how much of that passing yardage was through the air, versus after the catch? How many of those plays gained enough yardage for the team to "stay on schedule"? Did the offensive production come from "grinding it out", or mostly through big plays? Did either team attack certain parts of the field, or do most of their damage out of certain alignments or personnel groupings? Were they more successful running up the middle or around the end? These are all important, enlightening questions, and ones you can’t really answer with your typical box score. So, how do you do it?
Enter game charting, which coaches/staff members have been doing for a long, long time, and Bill C. started for college football prior to the 2013 season. For every game in 2015, I will rewatch every play of every Tide game, win or lose, and note the following information that isn’t captured in your typical box score or play-by-play record:
4 | Barring unexpected travel, which occasionally happens at my day job.
- The hash the ball was placed on at the snap.
- The pre-snap alignment and personnel grouping.
- The number of rushers/blockers, direction, and yards after the catch for every pass play.
- The reason for all incompletions, interceptions, and sacks.
- The type of run and run direction on every rush, as well as yards after contact.
- Any other notable occurrence, such as missed/great blocks, great jukes or broken tackles, and credits for tackles for loss or pass breakups.
Once all of that is recorded (a process that usually takes upwards of three hours), statistical analysis allows us to unlock the answers to all the questions I posed above, plus many more. You can expect to see efficiency (success rate) and explosiveness (yards per play or IsoPPP, which I’ll get to in a minute) measures split for a wide variety of situations, as well as frequency numbers to highlight preferences and tendencies. Last season’s Charting the Tide featured the following sections:
- The Map of Quarterbacking Excellence, a matrix showing passing performance to all parts of the field
- Passing splits, by down
- Formation (shotgun, pistol, etc.) splits
- Rushing splits, by down and direction
- Overall performance, split by down and distance
- Personnel splits, by number of backs and number of receivers
- Targeting data and catch rates, for the receivers
- Disruptive plays (sacks, forced fumbles, etc.), for the defense
That’s a lot of good stuff, and the response was way, way more positive than I had anticipated. However, I think it can be better, and I’ve spent a good chunk of the offseason figuring out how. Lots of new data is coming, so let’s get to it.
Better Personnel Splits
Last season, I provided separate charts that showed performance when 0-3 backs were on the field and when 0-5 wide receivers were on the field. However, a single back set, for example, can take many different forms — you’ve got four players to distribute as passcatchers or additional blockers. You could have two wide receivers and two tight ends, or four wide receivers and no tight ends, or the inbetween. Shoot, you could have four tight ends if you wanted — maybe Arkansas will do that this season. At any rate, those are all very different situations, and ones I didn’t properly account for previously. So for this year, I’m moving to the numerical personnel groupings referenced at all levels of football, which are two digits that reference the number of backs (first digit) and the number of tight ends (second digit) present in the formation. The number of receivers can be determined by subtracting the sum of these two digits from 5, which is the maximum number of non-quarterback skill players on any play in football. As an example, these were the five most common groupings for the Tide offense last season:
The first is a single back set with three wide receivers, the second is a single back set with four wide receivers, and so on. Nothing too crazy, and better than what I was doing before.
Last season I was (occasionally) using yards per play as an explosiveness measure, and there’s nothing wrong with that at all. Yards per play is the single most useful traditional metric for evaluating play-by-play performance in football — if you’re averaging two yards a play, for instance, it doesn’t matter what kind of offense you’re running, that’s terrible. If you’re averaging 15 yards a play, whatever you’re doing is working, really, really well.
But what if you’re averaging five yards a play? You could get there with six plays of five yards each, or two 15 yarders and four no-gainers, and you’d have no idea which one it was. Success rate begins to tell this story, but pulling in IsoPPP writes the ending. As we discussed earlier this week, IsoPPP is a calculation based on Net Equivalent Points (NEP), a system of valuation for each yard line on a football field, where the NEP gained only on successful plays is considered. This better isolates explosiveness when compared to PPP (calculated over all plays, not just successful ones), so that’s what I’m going to use this season. In addition, I intend to have an explosiveness measure available for each split, instead of only in the Map of Quarterbacking Excellence.
It’s difficult to provide equivalencies to yards per play, as IsoPPP includes adjustments for the area of the field in which the yardage was gained. Just as an example though, here’s some passing data from the Tide’s quarterbacks in non-garbage time last season:
|Pass Type||Com./Att.||Comp. %||S. Rate||Yds./Att.||IsoPPP|
|Left, 0-5 Yds.||30/45||66.7%||40%||4.9||0.8|
|Middle, 0-5 Yds.||4/7||57.1%||42.9%||7.4||1.6|
|Right, 0-5 Yds.||46/61||75.4%||57.4%||5.8||0.9|
Note these are all throws within 5 yards of the line of scrimmage. By completion percentage and success rate, throws to the right were comfortably ahead of the other two directions; note that throws to the left were completed more often than throws up the middle. You get into explosiveness though, and the middle throws were the best bet, averaging 2.5 yards more per attempt than throws to the left and about a yard and a half better than throws to the right. However, IsoPPP underlines just how much better those throws were, as the throws to the left and right were virtually indistinguishable in terms of explosiveness, but throws up the middle were essentially twice as explosive. We’ll get a better sense of how to interpret the IsoPPP numbers moving forward, but I think you can see how useful this will be to us.
5 | The sample size is small, of course. It’s an example!
Advanced Running Back and Offensive Line Metrics
Last season, I only provided success rate as a measure of performance for rushing. That’s a fine metric of course, but it only tells you so much, and it doesn’t begin to separate offensive line performance from running back performance. Fortunately, there are ways to do that, and I’ll be discussing them this season.
We all know that the success of every run, regardless of length, is due in part to the offensive line — yes, even the 80+ yard ones. The question is how much credit do you give the line on, say, a five yard run? A ten yard run? What about runs that lose yardage? FootballOutsiders evaluated years of rushing data, and determined runs fall into a couple of silos: runs that lose yardage, runs that gain 0-4 yards, runs that gain 5-10 yards (so-called "Second Level Yards"), and runs in excess of 10 yards ("Open Field Yards"). Depending on the length of the run, the credit for those yards varies between the line and the running back. I should note here this same effort by FO determined there was no statistically significant difference when teams ran straight up the middle or over the guards, which is why there are only five run directions (left and right end, left and right tackle, and middle) tracked in game charting for the NFL and college.
6 | Well, unless you’re named Barry Sanders.
Back in 2011 Bill refined this idea a bit for college, and introduced the concept of Line Yards, Highlight Opportunities, and Highlight Yards. For positive gains, the sum of Line Yards and Highlight Yards equals the rushing yardage achieved on the play. For negative gains, the line is credited with 120% of the yardage lost, with no penalty to the running back — just like what FO does for the NFL. For yards 0-4 of a run, the line receives 100% of the credit for the run.
Once a run extends to five or more yards from the line of scrimmage though, it’s considered a Highlight Opportunity, and the running back begins to receive credit for the yardage. For yards 5-10 of the run, the line and back split the yardage equally, and all yardage accrued over 10 yards goes to the running back. Here’s a chart that shows some examples of how this breaks out:
|Run Length||Line Yards||Highlight Yards|
As an example, as a whole the Tide accrued 1941 rushing yards last season in non-garbage time, not including any yardage attributed to a quarterback. The line accrued 1212 Line Yards, whereas the ballcarriers accrued 708 Highlight Yards, a total of 1920 yards. That indicates the line was penalized about 21 yards for negative gains; sure enough, 101 yards were lost in these situations, and with a rounding error here or there you can see how all of that lines up. I should note that while it’s simple to apply this analysis to run directions, it’s very difficult to assign line yards to individual blockers due to the complexity of blocking assignments.
7 | Most of those runs were scrambles off of busted/well-defended pass plays; omitting these is standard practice for this statistic.
Of course, you can certainly apply this on an individual basis to ballcarriers, since they run behind the same line in the same offense:
|Runner||Att.||Hlt. Opps.||Opp. Rate||Hlt. Yds.||Hlt./Opp.||RBR|
By taking the number of Highlight Opportunities over the number of carries that could become Highlight Opportunities, we get Opportunity Rate, which is a success rate based purely on run length instead of down and distance. By taking the total Highlight Yards over Highlight Opportunities, we get an explosiveness measure. Multiply the two together, and we get a handy "overall quality" metric I’m calling Running Back Rating or RBR.
8 | i.e., carries from the opponent five yard line or farther; this distinction is why Opportunity Rate isn’t just Highlight Opportunities / Carries.
9 | If you guess where I got that acronym from, you could receive YOUR CHOICE of a cookie or a gold stick-on star!
The chart above shows the top 5 backs from 2014 in terms of non-garbage carries from last season, with associated stats. Drake, Fowler, and Jones are not really worth discussing here given how little work they received, but the differences between Yeldon and Henry are clear. Yeldon was a marginally more efficient back by Opportunity Rate, but Henry was more of the big play back, and thus his RBR was a bit higher. Fun stuff!
Down and Distance, not just Down or Distance
Last season, I provided separate performance charts split out by down and distance. This allowed me to speak to performance on second down, or performance in long yardage situations. It did not, however, allow me to easily speak to performance on second and long, or third and short, etc. This year I’ll be presenting this information in a matrix format similar to the Map of Quarterbacking Excellence, which I think will be more illuminating than what I was doing before.
I’ve always provided completion percentages, but rarely have we discusses the reasons for those incompletions. This information is charted, so I will be including a table showing the breakdown every week.
There will be a few other touches here and there as well, but those are the highlights. I’m hoping that, despite the additional statistics, some of the streamlined presentation will cut down on the word counts a bit, as it’s a little much to ask you to read 4000 words every week on game charting. We’ll see how that goes though.