One of the metrics discussed in the Processing the Numbers primer was the F/+ combined ratings, the official college football rankings for Football Outsiders. In that article I noted it was improper to directly compare F/+ ratings from season to season, highlighting the increasing size of the division as a reason. There are actually three reasons why you shouldn't do this:
- Starting in 2011 (and back-compiled for 2007 and onward), a component for special teams was added to the F/+ formula. This could not be done for 2005 and 2006 due (presumably) to a lack of data, so F/+ ratings for those years are quite a bit lower than in other years.
- In 2005, there were 119 schools in Division I-A/FBS/Actual Football. Now, there are 125. Those six schools and when they joined: Georgia State ('13), Massachusetts ('12), South Alabama ('12), Texas State ('12), UTSA ('12), and Western Kentucky ('08).
- Finally, the "average" team is not going to be of identical quality each and every year, even before accounting for the increasing size of the division. This is particularly noticeable (in my opinion, anyway) in 2007, which is an aberrant year that shows up often in this article.
These schools are not the absolute rock-bottom of the division (here's to you, New Mexico State!), but they tend to be toward the lower end of the bottom half of teams. That drags down the "average" team, which inflates the F/+ rating of the best teams accordingly.
This also came up during Bill Connelly's 2014 Alabama preview, in which he lists the top 20 teams by F/+ rating during the F/+ era. You'll note there's nobody from 2006 and 2007 on the list, and that the best teams from 2005 are ranked relatively low, which makes sense given the issues noted above. I've long wondered if there was an appropriate way to compare these numbers across seasons, and I think I've settled on a reasonable way to go about it: calculating standard scores via standard deviation.
Hidden for those of you who want to skip the stats lesson and head straight to the results.
As a reminder, standard deviation is a measure of spread from the average of a set of numbers. The smaller the standard deviation, the more tightly clustered that set of numbers is around its average value. The larger the standard deviation, the more spread out the data is from the average.
The standard deviation is frequently used to identify the rarity of a particular result with respect to its population - if you say "three standard deviations above the mean", for the normal distribution you are talking about a reading in the 99.8th percentile, which is, you know, uncommon. Keep this in mind later!
The normal distribution, as its name implies, is fairly common and shows up in many different areas of science. If you aren't familiar with that term, you've probably heard of the bell curve we were all threatened with on grading in middle school and beyond - same thing.
Just to tie all of this together, below is a chart of 4 normal distributions, all of mean 0, with standard deviations varying from 1 (what's called the standard normal distribution) down to 0.25. As you can see, the roughly bell-shaped curves tighten up around the centerline as the standard deviation decreases:
In order to apply this thinking to F/+, we must first establish that the assumption of a normal distribution for the F/+ ratings is appropriate. I ran normal probability plots on each year as a test for normality, with the resulting charts below. A perfectly normal distribution will fit a line exactly - as you can see, we've got a pretty linear relationship each year, so assumption of normality is appropriate. The one kinda dicey year is 2007 - which just so happens to be one of the crazier years for college football in recent history that produced, among other oddities, a two-loss national champion:
With this in hand, I computed the standard deviation for each season's final F/+ ratings, and used those to convert the F/+ rating into a standard score. Since the mean for all of the distributions is either 0 or very close to it, I skipped this step and just divided out the standard deviations. Finally, just to provide a similar presentation to the original F/+ ratings, I found the percentiles for each standard score and subtracted 50%. This scales above-average teams from 0 to 50% and below-average ones from 0 to -50%. I'm calling this Adjusted F/+ Rating.
This seems to work for now - the best team in the database, 2011 Alabama, is at 49.87%. You'd be looking at an F/+ rating of about 75% before you run into a team sniffing an adjusted F/+ rating of 50%. The highest F/+ rating I've seen in-season is in the low-60s (they tend large early in the season and fall before the end), so we're not seeing anyone finish a season in the 70s anytime soon, provided there are no radical changes to the F/+ formula.
You'll note on the forthcoming table that the top 20 teams have standard scores of 2.12 or higher (corresponding to adjusted ratings in excess of 48.30%). That's pretty rarefied air - this is the top 1.7% of teams during this era of college football. It's a real shame that we don't have this kind of data for years before 2005 - I would love to know how the mid-80s Oklahoma, mid-90s Nebraska, or the early-00s Miami teams would stack up against the Saban-era Crimson Tide. Or, if you could adjust this for truly large gaps in time, figure out which Tide team was better - 1961 or 2011?
Before the big reveal, let's revisit the three issues noted earlier to see how we did:
- This approach resolves the issue of the difference in formula between 2005/2006 and the subsequent years, and that's pretty well reflected in the presence of three new teams from those years in the chart, as well as the higher ranking of the 2005 title game participants.
- This approach resolves the issue of the increasing division size, at least to a degree. I noticed the standard deviations increased from year to year - from ~14% in 2005 and 2006 to ~17% in 2008 to over 20% last year. This fits with the assumption the newer teams are not very good and are dragging down the average a bit, mainly by lengthening the negative tail of the distribution. Since we're now dividing out by that larger standard deviation, however, we're accounting for the change somewhat.
- The one thing this does not address is the slight differences in the "average" team from year to year. Remember, the "average" team will have an F/+ of exactly 0, but what exactly 0 means from year to year is different. There's no way to adjust for this, as one of the major components of the formula (FEI) is already scaled to the average for that year when provided to the public. Without having some unadjusted version of that to serve as a baseline, we can't account for this across years.
That being said, I don't think the changes in the average team are very significant from year to year, at least based on on-field quality, and for the last 10 years or so of college football. Maybe the
average team from 1955 can't hang with 2013 Penn State, but I'm thinking 2007 Kansas State or 2010 Ole Miss* probably could.
* These three teams were the closest to "average" for their respective years.
Now that we've calculated an adjusted F/+ rating, let's see what we've got:
THE TOP 20 TEAMS BY ADJUSTED F/+ RATING, 2005 - 2013
|2010 Boise State||39.30%||12||2.32||48.97%||11|
|2005 Ohio State||32.30%||36||2.31||48.95%||12|
|2011 Okla. State||37.90%||15||2.12||48.31%||20|
A Few Observations:
- There's not a huge change in representation, but the distribution of teams by year is a little more uniform than it was, which makes more sense to me:
- Hello, 2005! Notice that the absolutely loaded 2005 Texas team is now in the top 5 where they belong - their partner in the best** NCG I've ever seen, 2005 USC, is now in the top 10.
** More accurately, the best-contested NCG I've ever seen. Clearly 2009, 2011, and 2012 were more satisfying.
- 2007 is still not showing up. 2007 was a very weird year. LSU is lurking at #23. I think part of the problem here is the "average" team in 2007 was relatively good and that kinda mucked things up, but that's just an opinion.
- The biggest surprise to me was 2008 USC at #5 (where they rank both on my and Bill Connelly's lists), the highest-ranked non-NCG participant (2009 Florida at #10 is the next-highest). After being ranked #1 in the polls early in the season, a 4th quarter Mark Sanchez interception (shocking, I know) ruined a comeback at Oregon State for their only loss of the season. They proceeded to mostly obliterate the rest of their schedule to win the Pac-10 (including a 44-10 pantsing of Oregon in LA, who proceeded to beat Oregon State by 27 in Corvallis - further proof the transitive property does not apply to college football games), but didn't sniff the NCG at all.
- 2010 Boise State is sitting smack in the middle at #11, and was the highest-rated team that year despite a late-season overtime loss on the road to Colin Kaepernick's Nevada squad. I have to wonder if they could have hung with Auburn, who barely escaped an inferior Oregon team in the NCG game that year.
- The SEC makes up half of this list, because ESS EEE SEE THAT'S WHY. The only B1G squad came from 2005,
back when they were still relevantat the tail end of a stronger era for that venerable conference. The remaining 9 teams are a mixture from the other major conferences (and that one Boise State team).
- The 2011 Alabama Crimson Tide are still the best team of the F/+ era (and the best I've seen since the turn-of-the-century Miami teams), and they are joined in the top 20 by the 2009, 2010, and 2012 squads. 2013 Alabama, posting the 7th highest F/+ rating on record, sits at 22nd after the adjustment. As we get farther away from that season, I think it's clear that team was not on the level of some of the previous incarnations of the Tide, and this table supports that.
most abysmal rating system ever BCS picked one-loss Oklahoma to tangle with the "you'll never see a team play harder than we will" Tebow Gators instead, and while they hung with that annoying
Florida team late into the 2nd half, they eventually succumbed in the 4th quarter (sound familiar?), losing 24-14.
2008 Florida was outstanding in all phases of the game, ranking in the top 3 in each of the F/+ splits by the end of the season. However, their edge in the overall rating came primarily from special teams, which, as we all know, can be pretty volatile. Have to wonder if USC might have been a better matchup for Florida in that game.
So does this line up with how yall saw the last few years in college football? Any other takeaways from the table you think should be noted?