clock menu more-arrow no yes mobile

Filed under:

Processing the Numbers, Basketball Edition | A Primer

A handy guide to understanding the Basketball edition of Processing the Numbers, Roll 'Bama Roll's advanced stats game preview for Alabama Crimson Tide men's basketball.

Jeremy Brevard-USA TODAY Sports

RPI information courtesy of ESPN.com.
All other statistics are courtesy of KenPom.com, Ken Pomeroy’s outstanding basketball analytics site.

There are advanced metrics for bouncy ball too?

Indeed there are! The revolution1 that started in baseball and expanded to football has become pretty well-rooted in basketball as well, at both the professional and collegiate levels. The main man to know in this arena2 is Ken Pomeroy, proprietor of the well-known KenPom.com and some unholy hybrid of Bill Connelly and Brian Fremeau for college basketball. Ken’s website has a freakishly diverse pile of things on it, most of which is behind a paywall these days. But some juicy tidbits are still freely available to the public, and that’s what we’ll be talking about this season.

1 | Ok, maybe more of a minor civil disturbance.

2 | Basketball joke! /slaps knee //guffaws heartily

Oh, like bracketology or something.

Nope, not quite. You will not find that here. Like my viewpoint on the college football playoff committee, I am unconcerned with what the selection committee thinks or which way they are leaning or bubbles or any of that nonsense until Selection Sunday in March. If you would like to obsess over such things throughout the season, I’ll happily direct you to SBNation’s Bracketology Blog, Blogging the Bracket.

Sheesh man, calm down. How is this going to work then?

It’ll look a whole lot like the Processing the Numbers series for football, just with subject matter more on the squeaky and bouncy side of things. The idea right now is to do a PTN for each basketball game this season. We’ll review the previous game, discuss the metrics for the next one, and offer a pick at the end — with plenty of irritating little footnotes3 along the way. This may eventually become a weekly column that discusses multiple games, we’ll just have to see how it flows. To start we’ll focus mostly on what KenPom has to offer4, but down the road metrics from other sources may get pulled in. Any new stuff will be added to this article, which will be linked in every PTN Basketball from here on out. Finally, I’m not exactly renowned for brevity, and this primer is going to run pretty long — we have lots to discuss. But fear not, tl;dr nation! The game articles will be shorter than even their football equivalents, and correspondingly much less tedious to read. At any rate, let’s get started!

3 | Like this!

4 | With a hat tip to RPI, which comes from ESPN.

The Concepts

Football has plays and drives, but basketball boils down to possessions5

If you’re reading this right now, you’re either irate that we aren’t talking about football or you’re into basketball of the crimson variety. If you’re the latter, you’ve probably noticed the same stuff about basketball that I have. While lacking the complexity and chess match nature of football, basketball is a beautiful, nigh-artistic game performed by the second most impressive group of athletes on the planet6. The action rarely stops, with one element of the game flowing neatly into the next and into the next.

5 | Get ready for a lot of football analogies. I’m a football guy, it’s how I think.

6 | Offensive tackles are the best. Think about it sometime, you’ll come around.

That fluidity of movement is both an overt characteristic of the game’s athletes and, in my opinion, the defining characteristic of the game itself. Plays and formations are not discrete in basketball, and there is no real analogy to the concept of plays in football.

Drives are a different story, however. In football, they are chains of discrete units (plays) that define a segment of game action ending in points scored, the end of a time period, or a turnover of some sort (punt, unsuccessful fourth down attempt, lost fumble, or interception). In basketball, those discrete units are continuous, but everything else is the same, and we call it a possession.

Possessions are the basic building blocks of basketball, and form the bedrock of any advanced metric for the sport. We’ll talk in a bit about offensive and defensive efficiency ratings — these are numbers that fall in the 80-120 point range, and they are based on 100 possessions of basketball. Why 100? That’s the closest round number to the standard number of possessions in an NBA game, which fluctuates from year to year but is something on the order of 96 possessions.

The NBA game is 8 minutes longer, however, and has a much shorter shot clock (24 seconds vs. 35 seconds for the NCAA). As a result, there are significantly fewer possessions in an NCAA game, more on the order of 67 per game. They also aren’t an officially tracked statistic for the NCAA, so a formula is used to estimate the number of possessions in a game:

P = FGA – OR + TO + .475*FTA, where:

  • FGA is Field Goal Attempts
  • OR is Offensive Rebounds
  • TO is Turnovers
  • FTA is Free Throw Attempts

This makes sense, right? FGAs are the end of a possession, unless there is an offensive rebound extending that possession (thus the subtraction). TOs also end a possession. The only goofy part there is the FTA term. If a player goes to the line for two (or three!) free throws, including one-and-ones, that’s considered a possession for his team — that means he was fouled in the act of shooting (or prior to shooting, in bonus situations, etc.), nullifying the associated FGA. If he goes to the line for just one attempt, however, that means he made the previous shot, and the possession is already captured in the FGA term. The 0.475 coefficient estimates how many FTAs end up constituting a discrete possession, and is based on analysis of play-by-play data.

The number of possessions in a game is greatly dependent on tempo

This is also pretty straightforward. Basketball has its Oregons and Philadelphias too, and they tend to score a lot more points than everybody else. The most famous example of recent times were the “7 seconds or less” Phoenix Suns of the mid-to-late 2000s, but the effect is definitely present in college basketball as well. Looking at scoring on a per-possession level accounts for this of course, but knowing which teams like to play fast is important as well. The faster team dictates pace in basketball more than it does in football, due to that continuous nature we just talked about. Part of the reason those Suns teams were so successful is they forced their opponents out of their comfort zone, making them play faster and play more possessions than they were used to playing. If team A is like the Suns and team B is doing a whole 1950s throwback kinda vibe, that’s something you’re going to want to look at.

Schedule — who you play and who they play — is the most important thing of all

Strength of schedule (SOS) is just as critical in basketball as it is in any true team sport. If you pile up offensive efficiencies in the 110s and defensive ones in the 80s, that’s great for you! If you did it against a bunch of Abilene Christians and Lamars, then it means nothing! If you play the Kentuckys and Kansases of the world and put up those same numbers, you’re probably winning a national title.

The Metrics (aka, The Goods)

To start, let’s check out where the Tide rank at the time this article was written on 20 November 14:

Overall Quality
ALABAMA
RPI 0.5984 (89)
PYTH 0.7499 (62)
Luck 0.144 (29)

Efficiency Ratings
ALABAMA
OE+ 105.8 (75)
DE+ 96.2 (51)
T+ 69.7 (101)

Schedule Ratings
ALABAMA
Schedule PYTH 0.2471 (281)
Opp. OE+ 97.7 (260)
Opp. DE+ 107.7(340)
NCS PYTH 0.2471 (281)

(Bold) numbers indicate national ranking.

I put the date there because by the time you read this, the Tide will have played Southern Miss, and that will cause all of the numbers to change. Unlike the football PTN, expect to see a date attached to the metrics for these articles, as the stuff at KenPom.com is updated on a nightly basis.

In contrast to the Tide’s football metrics, the basketball team is… less good. Keep in mind though that there are 351 teams in Division I basketball vs. the 128 teams in FBS, so 62nd overall is still in the top 20% of teams in the country — it’s not like this team is a dumpster fire or something. Let’s go through each of these metrics and discuss what they mean and what they measure:

RPI: the Ratings Percentage Index

Most casual fans will have definitely heard of this one, as it’s a major criterion used by the selection committee and frequently gets shoved down your throat when seeing coverage on ESPN, CBS, etc. The RPI formula is pretty simple:

RPI = 0.25*WP + 0.5*OWP + 0.25*OOWP, where:

  • WP is a team’s Winning Percentage, simply wins over games played,
  • OWP is the Winning Percentage of a team’s Opponents,
  • and OOWP is the Winning Percentage of a team’s Opponents’ Opponents.

The latter two components combine to form a measure of schedule strength, so RPI is essentially 25% WP and 75% SOS. It combines how you perform with how your opponents perform in one easy-to-reference number. Simple, elegant, and useful, right?

Wrong! There are two big problems with RPI. First of all, there’s no statistical validity to the formula. It’s junk. The coefficients are totally arbitrary. The components are less arbitrary, but still pretty arbitrary. A good ranking metric has a strong theoretical basis, a decent correlation with winning percentage on a seasonal level, and considers as many aspects of play as is reasonable and feasible. RPI has none of those things.

Secondly, margin of victory is nowhere to be found. You beat Central Arkansas7 by 1 point? Gold star for you, good sir! Win at Duke by 25? Gold star for you as well, good sir, and the nation thanks you! Counting those two wins the same makes no sense, and while the SOS portion of the formula mitigates it somewhat it’s not enough. This is sometimes cited as a strength of the formula, as it discourages manipulation of the score for gambling and selection purposes.

7 | I don’t really have anything against the Southland Conference, they are just a convenient punching bag.

All that being said, RPI is a major component of selection for the tournament, and is frequently referenced as noted above, so it has a place here. The intent is certainly more insightful than points per game, so in that sense it’s an advanced metric. Currently, the Tide are 89th in the country in RPI.

Update: Completely forgot to mention weighting when I originally wrote this, and the shame of that omission will haunt me until the end of time. In 2004, the RPI formula was updated to account for the strength of a win based on where the game was played. Neutral site wins carry no weight, but a road win is worth 1.4 wins and a home win is worth 0.6 wins. Also, when calculating OWP and OOWP for a particular team, any game against that team is not included.

OE+ and DE+: the Adjusted Offensive and Defensive Efficiency Ratings

We talked a bit about this above, but the basic efficiency metric in basketball is points per possession. For the offensive rating, this is the numbers of points your offense scored over the number of times it possessed the ball, then multiplied by 100 to give a rating per 100 possessions. The same applies for the defense, except it measures the number of points given up.

This is normally done on a per game basis, and then averaged together to get a season rating. You can certainly bypass the per game part and use the entire data set for the season to arrive at a season rating, but this will end up weighting games that had more possessions than the average.

The “adjusted” part involves scaling the ratings against the country’s average, a process known as Normalization. The resultant rating describes, for offense, the estimated efficiency against the average D-I defense, and for defense, the estimated efficiency against the average D-I offense; in both cases, some weighting is given to more recent games. These are somewhat analogous to F/+ and FEI components in football. Currently, the Tide rank 75th and 51st in adjusted offensive and defensive rating respectively.

T+: the Adjusted Tempo Rating

As discussed earlier, tempo is expressed in possessions per game. This is calculated on a per-game basis, and then averaged out to give a seasonal rating. When normalized against the country’s average, the metric produced measures a team’s expected tempo against a team that wants to play at an average tempo. The Tide play a bit faster than your typical team so far this season at 69.7 possessions a game, ranking 101st in the country in this metric.

PYTH: the Pythagorean Rating

And now we get to the star of the show. The concept of Pythagorean Expectation has been around for a while — sabermetrics godfather Bill James introduced it in his seminal The Bill James Baseball Historical Abstract back in 1985. What it does is estimate a team’s win percentage based on offensive and defensive performance, and is so named for its resemblance to the Pythagorean Theorem you all used back in your geometry days. For baseball you use runs scored and allowed, and points scored/allowed have been used for basketball in the past. In this case, adjusted offensive and defensive ratings are used as follows:

PYTH = (OE+)^10.25 / [(OE+)^10.25 +(DE+)^10.25]

- or -

PYTH = 1 / [1 + (DE+ / OE+)^10.25]

When you see this concept employed the structure of the formula is the same, but the exponents change from application to application. It’s dependent on what goes into the formula, and is adjusted to maximize correlation with winning percentage. Which, by the way, is why this is a superior rating system to something like RPI — direct, demonstrable correlations to win percentage. It also has a sound theoretical basis, which you can read about in detail here.

It should be noted that this is purely a predictive tool, and not a rating of how good a team’s season has been. Somewhat goofy perhaps, but perfectly suited for comparing Alabama with its opponent on a game-by-game basis. The Tide’s current PYTH rating of 0.7499 is good for 62nd in the country.

There are two other PYTH ratings we’ll use this season. One is Schedule PYTH, which uses the OE+ and DE+ of a team’s opponents in the formula as a strength of schedule rating. The Tide’s is currently at 0.2471, which is the 281st highest in the country. The other is a split for non-conference schedules, dubbed NCS PYTH above. The two schedule ratings will be the same for teams until conference scheduling starts later in the season, but will be a very important consideration both conference games and postseason tournament games.

Luck

The last metric to discuss is Luck. Essentially, this is a measurement of actual performance over that predicted by the PYTH rating, such that a positive Luck indicates a team outperforms expectations and vice versa. This gets down into close scoring games — you would expect the average team to win some of these games and lose some of them. If you have a team that wins most or all of its close games and their actual record is a good bit higher than what it should be based on performance8, that would be an indicator of luck. The Tide are currently +0.144 in luck, indicating they are the 29th luckiest team in the nation.

8 | Generally referred to as an “Auburn.” Not sure why.

A word about preseason projections

It’s kinda hard to do anything like this with only two or three games of data, right? KenPom calculates some preseason ratings based on the last two seasons’ offensive efficiencies and the previous season’s defensive efficiencies, with adjustments for returning players and top recruits (but NOT transfers) thrown in. I don’t want to get into the nitty gritty of that here — if you are interested in knowing more about it and how it compares with the preseason AP poll, check out this article.

The real question, of course, is how long do these preseason ratings have an effect on the actual ratings? At first, quite a bit — that opening win over Nicholls St. doesn’t mean a whole lot in the grand scheme, etc. As the season wears on, though, the preseason projection is given less and less weight, until it disappears entirely in late January.

The Four Factors

There’s a school of thought out there that all of the different elements of performance in a particular sport can really be traced back to a small number of “factors”, which can be thought of as core skills and abilities. The most famous example is Dean Oliver’s Four Factors of Basketball, which are shooting, ball security, offensive rebounding, and drawing fouls. Nothing too shocking there — score points (from shooting well and getting to the line a lot) and limit your opponent’s opportunities to score (by taking care of the ball and limiting possessions through offensive boards).

The linked article was written with an NBA bent, but the Factors can certainly be applied to the college game as well. To measure this, we'll look at the following four statistics:

EFFECTIVE FIELD GOAL PERCENTAGE (eFG%)

Basketball at all levels includes the big arc out beyond the paint for a reason — shots from out there are tough! In return, the shooter receives three points for making a shot from beyond that line, offsetting the lower percentage of makes from that distance with a higher point value. That really ought to be considered when assessing a shooter's effectiveness, right? A guy who shoots 50% from the field is great, but a guy who shoots only threes at a 45% clip is even better.

To do that, we use Effective Field Goal Percentage, which accounts for the fact that a three pointer is 1.5 times as valuable as an ordinary two pointer. The formula for it is pretty simple:

eFG% = (.5*3PM + FGM) / FGA, where:
  • 3PM is three point field goals made,
  • FGM is field goals made, and
  • FGA is field goals attempted.

TURNOVER RATE (TO%)

Turnovers have been tracked on basketball box scores for ages, but using them as a counting stat has a major flaw — it doesn't account for the number of possessions it took to generate those turnovers. In order to account for differences in pace between teams, Turnover Rate is simply the number of turnovers over the number of possessions it took to accrue those turnovers.

OFFENSIVE REBOUNDING RATE (OR%)

Second chance points, where an offense generates an additional possession off a missed shot through an offensive rebound, are extraordinarily valuable. Not only do you have an opportunity to make up for that miss, but you're shortening the game for your opponent at the same time. There are only so many rebounding opportunities in a game, and they end up as either an offensive rebound or a defensive one for the opponent. As such, Offensive Rebounding Rate is defined as follows:

OR% = OR / (OR + ODR), where:
  • OR is offensive rebounds and
  • ODR is opponent defensive rebounds.

FREE THROW RATE (FTR)

And-1s are great, aren't they? Not only did you make your shot, but you drew a foul along the way, affording you the opportunity to pick up a freebie. Turns out that's pretty efficient basketball, and the way we measure it is with Free Throw Rate, which is simply free throw attempts over field goal attempts.

PUTTING IT ALL TOGETHER

These factors are not all equally weighted, however. Shooting is the most important skill in basketball, as it is the only factor that has a direct impact on a team's ability to score. Likewise, generating free throws is nice, but it's not usually a make-or-break proposition. Analysis from both the NBA and college basketball have identified the following weights for the Four Factors:

  • eFG% — 40%
  • TO% — 25%
  • OR% — 20%
  • FTR — 15%

Using these weights, the Four Factors can be combined into a single metric assessing the overall quality of a team's performance in a game, which I'm going to refer to as Win Index. The version I'm using will be scaled to 100, so the maximum possible score in each factor (150% eFG%, 0% TO%, 100% OR%, and 100% FTR) will give a Win Index of exactly 100.

ROLL TIDE