Wednesday, February 26, 2014

(To regular readers of this blog who are looking for basketball content -- this posting is in my occasional series on the tools I use, and may not be of interest. On the other hand, if you found this posting Googling for a workaround for an authenticating proxy in Quicklisp, you're in the right place!)Quicklisp is a great tool for installing packages into Common Lisp. Unfortunately, it doesn't work with authenticating proxies, which is a bother in many environments. However, there's a fairly straightforward workaround to fix this.

The basic
idea is to insert a new proxy between Quicklisp and the authenticating
proxy. Quicklisp will talk to the new proxy without authentication, and
the new proxy will take care of authenticating with the real proxy.There are probably several different proxies you can use to do this, but I chose DeleGate. It's free to use for non-commercial use. Download Delegate, install it, and then start it with the following command line:

This
says to start delegate as an HTTP proxy on port 8091, forwarding all
requests to port 8080 on the realProxy, using the authentication
provided in MYAUTH. (The ADMIN parameter is just an email address
required by Delegate, you can put in whatever you'd like.) Obviously you'll need to fill in the address of your existing proxy and the correct username and password for that proxy. Once Delegate is running, you can install Quicklisp as follows:

(quicklisp-quickstart:install :proxy "http://localhost:8081/")

This directs Quicklisp to the new Delegate proxy. Delegate connects to the real proxy using the username and password you provided, and then just passes along the requests. The result is that Quicklisp doesn't have to talk directly to the authenticating proxy.

It's worth noting that whatever proxy specification you use when running quicklist-quickstart:install will be used by Quicklisp from then on. If you ever need to change your proxy, you do it like so:

(setf (ql-config:config-value "proxy-url") "http://proxy.value.here")

Under Windows, you can set up Delegate as a service that will run every time Windows starts or just start it manually whenever you want to use Quicklisp. The same trick will work for other programs that don't know how to interact with an authenticating proxy.

The farther we get into the season, the harder it is for the ratings to
move, so as an experiment I'm showing the change in ratings from the
previous Top Twenty as a percentage. The big upward moves this week are
Kansas (pounded #19 Texas), Arizona (pounded Colorado) and Wisconsin
(beat #15 Michigan and #15 Iowa). The big losers are Syracuse and Duke
for obvious reasons.

Syracuse
is barely favored at mediocre Maryland, which could lead to the
spectacle of having the #1 team lose four straight games. How Boeheim
will top his Duke meltdown if this happens is unknown.

BLOWOUT OF THE WEEK

Virginia Tech @ #5 Duke: Duke by 31

VT is getting worse much faster than Duke is getting worse.

#20 Michigan @ Purdue: Michigan by 3

A good chance for Purdue to steal a meaningless but satisfying upset.

#7 Cincinnati @ #21 Connecticut: UConn by 6

Cincinnati has popped onto the PM's Top Twenty this week but will probably be overmatched at Connecticut.

#8 Kansas @ Oklahoma State: OKSt by 5

The
PM may be the last believer in OK State (although they had a good win
this week against Texas Tech). If the PM is to be believed, they have a
good chance to beat Kansas next Saturday.

#11 Louisville @ #22 Memphis: Louisville by 3

A
good illustration of how powerful the HCA is in college basketball.
This is #1 versus #34 in the PM's book, and the #1 is still only favored
by 3 points because they're on the road. Speaking of which...

UCLA won by 20 (!). Good for
my Bruins, although they blew it later against Stanford.

#5 Duke @ UNC: Duke +4

Duke lost by 8.

#1 Syracuse @ #5 Duke: Duke +10

Coming
off the inexplicable loss to BC, Syracuse loses to Duke by 6, although
that number was artificially inflated by Boeheim's meltdown. Not a good
week for Duke, and they'll probably drop in the PM's rankings.

#11 Louisville @ #7 Cincinnati: Louisville +4

Louisville is underrated by the AP and Cincinnati is overrated.

Louisville squeaks out a 1 point win.

#19 Texas @ #8 Kansas: Kansas +10

Despite the scare at Texas Tech, Kansas should handle Texas without any problems.

Kansas by 31. I tuned in during the first half and wondered if there was a scoreboard malfunction -- it was 46-18 at the half.

#16 Wisconsin @ #15 Iowa: Iowa +7

Road records in the B10 have been crazy, so I'm hesitant to put much faith in this prediction.

Wisely, as Wisconsin wins by 5. Wisconsin has been steadily improving since the lost to Northwestern.

BLOWOUT OF THE WEEK

San Jose State @ SDSU: SDSU +29

VT @ Duke is also going to be a blowout.

I accidentally looked too far ahead for these blowouts -- both games will be this week.

Wednesday, February 19, 2014

A simple model of basketball is this: A team brings the ball up court, runs its offense until it gets a good shot opportunity, and then shoots the ball. So you might expect that good offenses (and poor defenses) lead to shorter times of possession, and vice versa. This suggests that knowing something about a team's time of possession can help us judge its offensive effectivity.

(Of course, there are a lot of complicating factors. Some teams intentionally take long or short possessions. And possessions that start with an offensive rebound are likely to be shorter than ones that don't. And so on. But I did say this was a simple model.)

A "possession" is typically defined as a period during which one team or the other continuously controls the ball. The traditional way to calculate the number of possessions in a game is to use a formula that looks something like this (there are several widely used variants):

Possessions = FGA - Oreb + 0.475*FTA + TO

If you look at the first two terms, you'll see that if a team has the ball, makes a shot attempt, gets the rebound and makes another shot attempt, that will equate to one possession, because the offensive rebound will "cancel out" the first shot attempt.

However, I'm interested in a different definition of possession -- one that corresponds with a team receiving a fresh shot clock. This equates to the number of times a team "runs it's offense". With this definition, if a team has the ball, makes a shot attempt, gets the rebound and makes another shot attempt, that will equate to two possessions.

After some experimentation, a fairly good equation for estimating that number seems to be:

Possessions = FGA + 0.666*FTA + TO + 3

(This is based upon counting the possessions from play-by-play data and performing a linear regression for games from the 2013 season.)

If you calculate the number of possessions for each team, you can then calculate the average length of possession for the game, but you cannot determine the average length of possession for each team. To do that, you need to analyze the play-by-play data. Fortunately, the ESPN Scoreboard provides play-by-play data for the majority of games, and the format is fairly standardized.

I've spent the last couple of weeks figuring out how to scrape the play by play data and analyze it to determine change-of-possession and average length of possession. Whether these statistics are useful for prediction remains to be seen.

The Top Twenty was delayed a bit this week so that I could finish up some data collection (more on that shortly).

1

Louisville

31.09

NC

2

Iowa

30.41

NC

3

Arizona

30.03

NC

4

Duke

30.01

NC

5

Creighton

29.65

(+1)

6

Oklahoma St.

29.55

(-1)

7

Michigan

29.42

NC

8

Ohio St.

29.34

NC

9

Villanova

29.29

NC

10

Kentucky

29.26

NC

11

Michigan St.

29.07

NC

12

UCLA

28.98

(+1)

13

Florida

28.95

(+1)

14

Iowa St.

28.9

(-2)

15

Kansas

28.76

(+1)

16

Arizona St.

28.7

(+2)

17

Connecticut

28.69

NEW

18

Pittsburgh

28.64

(-3)

19

Syracuse

28.64

(-2)

20

Wisconsin

28.58

NEW

The top ten is remarkably stable this week.
Some shifting around in the bottom half -- Pittsburgh losing to Syracuse
brought them down, and Syracuse's near miss at NC State has brought
them down. Meanwhile ASU moves up thanks to the nice (but not
completely improbable) win against Arizona.

PREDICTIONS

TOSS-UP OF THE WEEK

#23 UCLA @ Cal

Cal is a bare favorite at home.

#5 Duke @ UNC: Duke +4

Assuming no more freak snowstorms. A win here would prepare Duke for...

#1 Syracuse @ #5 Duke: Duke +10

It's an almost certain end to Syracuse's unbeaten streak. A team that needs a last second basket to escape (#100) NC State is going to have to play a lot better to win at Cameron.

#11 Louisville @ #7 Cincinnati: Louisville +4

Louisville is underrated by the AP and Cincinnati is overrated.

#19 Texas @ #8 Kansas: Kansas +10

Despite the scare at Texas Tech, Kansas should handle Texas without any problems.

#16 Wisconsin @ #15 Iowa: Iowa +7

Road records in the B10 have been crazy, so I'm hesitant to put much faith in this prediction.

Oklahoma State continues to drop, Villanova jumps up on strong wins over Seton Hall and Xavier, and Florida makes a big jump.

PREDICTIONS

#10 Michigan @ tOSU: tOSU +3

tOSU is coming off two great road wins, so maybe they continue
the streak with an upset of Michigan.

TOSS-UP OF THE WEEK

#19 Oklahoma State @ #15 Texas

Oklahoma
State has four straight losses and Marcus Smart is out. Texas is
coming off an embarrassing loss to KSU. Anything could happen, but with
the absence of Smart I'll give Texas the nod.

#1 Syracuse @ #25 Pittsburgh: Pittsburgh +3

The
PM strongly disagrees with the AP's reverence for unbeaten teams.
Pittsburgh is marginally the better team and at home, so they should win
this game.

#11 Duke @ UNC: Duke +5

Anything can happen in this rivalry, but Duke's good enough that they should win this comfortably.

#1 Arizona @ ASU: Arizona +5

ASU is undoubtedly the best 18-6 team in the country, but they have a big job to beat Arizona even at home.

#3 Florida @ #18 Kentucky: Kentucky +5

If
Kentucky wins it will be seen as a big upset, but these teams are
actually pretty comparable and Kentucky should win at home. Conversely
if Florida can win this game it will cement them as a tournament
favorite down the stretch.

Saturday, February 8, 2014

Last time I talked about one approach towards building a metric that could be used to sensibly compare two teams in offensive rebounding. That approach was to use average percentages adjusted for difficulty of schedule. In this post I'll talk about another approach.

The problem of comparing teams has been addressed in many different ways for team (scoring) strength. That has many of the same problems as comparing teams for offensive rebounding strength. So we can adopt a good measure for comparing team scoring strength and use it for comparing team offensive rebounding strength. It's easiest to do this with a model that is driven by team scores (rather than wins & losses), such as the PMM, Massey or MOV-Based Terry-Bradley, etc. The basic idea is to use offensive rebounding numbers instead of scores to drive the ratings.

I did this and calculated the current top ten offensive rebounding teams:

Rank

Team

Score

1

Quinnipiac

19.26

2

UAB

18.09

3

Purdue

17.36

4

North Carolina

17.14

5

VCU

16.85

6

Morehead St.

16.66

7

Tennessee

16.5

8

San Diego St.

16.37

9

LSU

16.16

10

Morgan St.

16.15

Compare this to the table from the last posting:

Rank

Team

Rating

Total

1

North Carolina (15-7)

1.36

295

2

Quinnipiac (13-8)

1.35

357

3

San Diego St. (20-1)

1.31

217

4

UAB (14-8)

1.28

284

5

Northern Illinois (10-11)

1.26

257

6

Tennessee (14-8)

1.26

312

7

Purdue (14-9)

1.24

289

8

Arizona (22-1)

1.24

277

9

Indiana (14-8)

1.23

268

10

Long Beach St. (9-13)

1.23

212

There's quite a bit of overlap. Some teams have moved up or down, but note again that in this measure just grabbing lots of offensive rebounds isn't sufficient. With this approach we can also easily look at teams by offensive rebounding advantage -- at how much advantage you have over your opponents:

Rank

Team

Score

1

Quinnipiac

20.28

2

Tennessee

19.33

3

Boise St.

18.55

4

Stephen F. Austin

18.49

5

Morehead St.

18.33

6

Southern Miss

17.79

7

West. Kentucky

17.64

8

Providence

17.59

9

UAB

17.44

10

Syracuse

17.36

There's some overlap here, but a number of new teams appear. These are teams that are decent offensive rebounders, but are also pretty good at keeping the other team from grabbing offensive rebounds.

Any of these measure is better for understanding offensive rebounding performance than just looking at the raw numbers of offensive rebounds (as commentators are wont to do).

Friday, February 7, 2014

I've been working a lot lately on refactoring the Prediction Machine code for improved performance. Previously, it took about 15 minutes to process a single season of data, and due to memory issues, I couldn't process all of my data in one pass; I had to stop and restart my processing for each season. So I spent a week or so profiling the code, removing memory leaks, speeding up the slowest processing, and so on. The results were pretty remarkable -- about a 25x speedup overall. This makes it much easier to try out new ideas on the entire data set.

Speaking of which, I was watching a college basketball game the other night and the announcer claimed that one of the teams was "a good offensive rebounding team". He said this based upon nothing more than the team grabbing two offensive rebounds in a row, but it made me wonder exactly how one could tell if a team was a good offensive rebounding team, or not.

For any particular game we know the number of offensive rebounds each team grabbed. But that tells us very little. If Duke and UCLA played each other and Duke grabbed 13 offensive rebounds and UCLA grabbed 4, we'd be tempted to say that Duke is the better offensive rebounding team. But are they?

Well, it's obvious that one game could be just a fluke. So we need to look at performance over a number of games. So let's suppose that Duke averages 13 offensive rebounds a game, while UCLA only averages 4. Now can we say that Duke is a better offensive rebounding team? Maybe not.

Suppose we found that Duke is shooting 28% from the field while UCLA is shooting 87% from the field. The difference in offensive rebounds might simply reflect a difference in opportunities.

Let's correct for that by expressing offensive rebounding as a percentage of available opportunities (e.g., offensive rebounds / missed shots). So now suppose that Duke is grabbing 35% of it's offensive rebound opportunities while UCLA is grabbing only 27%. Now can we say that Duke is a better offensive rebounding team? Maybe not.

If Duke and UCLA didn't play all the same opponents, then those aren't apples to apples numbers. Suppose we found that Duke's opponents had held their opponents to a 45% offensive rebounding rate, while UCLA's opponents had their opponents to a 15% offensive rebounding rate. Now it appears that Duke is a comparatively weak offensive rebounding team, while UCLA is comparatively strong.

Did you follow all that? Express offensive rebounding as a percentage of the available rebounds, average it over all the games, and then adjust it for opponents. And then you -- maybe -- have a number that you can use to compare teams.

If we run this statistic for the current season, here are the top offensive rebounding teams:

Rank

Team

Rating

Total

1

North Carolina (15-7)

1.36

295

2

Quinnipiac (13-8)

1.35

357

3

San Diego St. (20-1)

1.31

217

4

UAB (14-8)

1.28

284

5

Northern Illinois (10-11)

1.26

257

6

Tennessee (14-8)

1.26

312

7

Purdue (14-9)

1.24

289

8

Arizona (22-1)

1.24

277

9

Indiana (14-8)

1.23

268

10

Long Beach St. (9-13)

1.23

212

You might be surprised to see Quinnipiac at #2, but they lead the nation in total number of offensive rebounds (as shown in the last column). (And Tennessee at #2 has 45 (!) fewer rebounds.) What's interesting here is the obvious disparity between the raw number of offensive rebounds and the rankings. San Diego St., with only 217 offensive rebounds, is #3 largely because they've played teams that are tough to rebound against. It's also interesting to note that there are some very good teams in this list.

In my model this statistic doesn't have a lot of predictive value -- but then, I have a variety of other statistics that characterize offensive rebounding performance. One interesting thing about this statistic is that it is about ten times more important to the Away team than to the Home team. This suggests that good offensive rebounding teams might play a little better on the road.

Next time we'll look at another way to measure offensive rebounding performance and see how the two measures compare.

The TopTwenty
gets jumbled around in mysterious ways thanks to all the random upsets,
etc. as well as an update to the rating algorithm. MSU loses on a
neutral court to Georgetown (PM #64) and somehow manages to jump up 8
spots. Hmm. They did have a nice win over Iowa and their loss to
Michigan looks better in retrospect. Meanwhile Kansas beats #11 ISU
handily at home and loses badly to #50 Texas and plunges 9 spots.

The PM is undergoing a significant overhaul at
the moment, so there may be rather more uncertainty in the ratings than
usual.

Monday

#16 Iowa State @ #19 Oklahoma State: OK State +1

Just the barest advantage to the home team.

Tuesday

Ohio State @ #17 Iowa: Iowa +4

I wouldn't trust tOSU to live up this number. But given the way the B10 has been going, they might win by double digits.

Wednesday

#13 Saint Louis @ St. Joseph's: St. Louis +5

The Billikens almost lost at home to dreadful GMU, so this will be an important road test.

Oklahoma beats OK
State by 12 (!) for their 3rd win over a ranked opponent in the last six
games, and then Baylor beats OK State by 6 (!). So apparently the PM's
assessment of OK State was *way* off.

#18 Duke @ #20 Pittsburgh: Pittsburgh +4

#18 Duke @ #2 Syracuse: Syracuse +3

Duke gets a resume
win (+15) at Pittsburgh, and loses by 2 in OT at Syracuse. (The PM had
Duke as a slight favorite in the Syracuse game after the win at Pitt.)
Two really solid performances for Duke that should boost them in the
standings.

#3 MSU @ #10 Iowa: Iowa +6

Iowa took MSU to OT before losing by 2.

UPSET ALERT

#14 Kentucky @ LSU: Toss-up

This could be a "trap" game for Kentucky's freshmen.

I hate to tell you so, but I told you so. LSU by 5.

#16 Iowa St. @ #8 Kansas: Kansas +6

Kansas +9. But ISU came back on Saturday to get a nice win over Oklahoma.

#15 Cincy @ #11 Louisville: Louisville +15

Cincinnati with the big upset +3. Not a good home loss for Louisville!

#17 tOSU @ #9 Wisconsin: Wisconsin +6

These two teams have been racing each other to be the first to drop off the TopTwenty, but one of them will have to win. Probably the home team.

Or not. tOSU escapes with a 1 point victory. They're both 2-5 over the last 7 games.

BLOWOUT OF THE WEEK

UCF @ #11 Louisville: Louisville by 25

Louisville by 17.

TOSS-UP OF THE WEEK

Houston Baptist @ Lamar

Lamar won by 2 points on a layup with 8 seconds to go... about as close a toss-up as you could ask!