With enough data points, it becomes possible to calculate the expected effect of a given play on goal outcomes. We can argue about how many occurrences you need for a model to converge on an expected value, but fundamentally, if a certain type of play more often results in that team scoring a goal than their opponent, we can say that it has a positive expected value.

In the simplest example, if you assume that Maryland commits a penalty, and Notre Dame has a 60 second man up opportunity, then within the next 1 minute, would you expect Maryland or Notre Dame to score more goals on average? Notre Dame obviously, so we can say that the penalty committed by Maryland has a negative expected value. I’ll go out on a limb and say that this is not a stretch for those familiar with lacrosse…or averages.

In the list below, assume that Maryland is the one initiating each play. The Diff_per column shows how many goals Notre Dame would be expected to score in the next 60 seconds. (For example, a saved shot here means that Maryland shot on goal and Notre Dame saved it; we would expect Maryland to score .17 goals in the next 60 seconds to .12 for Notre Dame.)

PlayType

GoalsFor_per

GoalsAgainst_per

Diff_per

Missed Shot

0.287

0.072

0.215

Shot Clock On

0.301

0.090

0.211

Good Clear

0.234

0.055

0.179

Ground Ball

0.255

0.083

0.172

Pipe Shot

0.258

0.088

0.170

Faceoff Win

0.211

0.053

0.158

Blocked Shot

0.228

0.098

0.129

Timeout

0.191

0.126

0.066

Saved Shot

0.173

0.125

0.048

Assisted Goal

0.151

0.114

0.037

Unassisted Goal

0.145

0.120

0.025

Goalie Change

0.137

0.131

0.006

Substitution

0.148

0.156

-0.008

Penalty – 0 sec

0.029

0.143

-0.114

Unforced Turnover

0.067

0.193

-0.126

Forced Turnover

0.065

0.200

-0.135

Penalty – 3 min

0.107

0.250

-0.143

Shot Clock Violation

0.040

0.184

-0.144

Failed Clear

0.062

0.252

-0.190

Penalty – 2 min

0.106

0.394

-0.288

Penalty – 30 sec

0.052

0.396

-0.344

Penalty – 1 min

0.053

0.427

-0.374

A few things stand out, some expected (and helpful for confirming the validity of the model) and some not. First, if you commit a penalty, then you should expect to give up more goals that you score over the next 60 seconds. Makes sense. Same goes for turnovers and shot clock violations, if you give up possession, good luck scoring more than you give up.

On the flip side, the two goal entries are interesting. It’s minimal, but both assisted and unassisted goals have a positive expected value, meaning in addition to the goal scored, the model says you should expect .03 to .04 goals in the next 60 seconds. It’s tiny, but this speaks to the value of momentum? Or perhaps it is less causal; just an implication of better face off men leading to more goals as well as more runs. Most likely, it’s just the fact that a team scoring a given goal is minutely more likely to be the better team, and they should score more goals. (Note to self: worth another post.)

After thinking more about this one, it makes more sense, but I was surprised to see Missed Shot as the highest net goal getter in this list. (Mind you, our model differentiates between blocked shots, pipe shots, saved shots, and missed shots.) In retrospect, it probably makes sense; if you have a completely missed shot, it’s probably getting backed up, so you still have possession. Contrast this with a pipe shot which could go anywhere. Also, a missed shot also means you are in attacking position with, presumably, enough offensive action to warrant a shot. Contrast that with a ground ball, which means you have possession, but it may not be in the attacking end.

So what do we do with this information. Well for one, we don’t start whipping shots over the goal from midfield. This is a purely descriptive analysis; if someone in a wacko-universe adjusted strategy on account of this, it would be like the snake eating its own tail.

For starters, it would be very interesting to see whether there are certain teams for whom these values are starkly different. Are there certain styles for whom a missed shot is not in fact a good thing? Are certain teams better able to capitalize quickly on the change in momentum from a turnover? All interesting questions that speak to style I suspect.

In addition, we posted earlier this month on our win-expectancy model, and these play-specific values are integral to that model. Time, score, and possession create a very effective framework for determining win expectancy. By looking at historical games, you can create a pretty accurate model based on the percentage of historical games in each situation that have gone to either team. In fact, our model borrows this framework almost exactly.

The small tweak, made possible by our play by play data, is to adjust those numbers based on the flow of the game. For example, if a team is down by one goal with 1 minute left, I’d be immensely more confident in my team’s chances if they’ve been whipping high quality shots just over the upper corners for the last 90 seconds than if they’d just recovered the ball in their own end and had to complete a clear and then get into the offense. In other words, the most recent plays have some predictive power in what comes next. We think that this is very useful if you are publishing a real-time win expectancy score.

Think of it this way, time-score-possession allows you to identify which historical cohort of games your current situation is most like. And again, that is an incredibly useful benchmark when dealing with who is more likely to win a given game. But in our estimation, that approach does not take into account the teams and game flow of the specific game you are looking at. Time-score-possession gives a plodding team the same odds of coming back from 4 goals down as a lightning fast team. But we know intuitively that a team with a more quick strike style of play is more likely to come back from that deficit. (They are also more likely to lose by even more than the current score, but if you are just trying to predict who wins, this is irrelevant.)

And in future, this approach also opens the door to more sophisticated tweaks to the model. Currently, we don’t look at the plays a team has used to get to where they are. For example, a team that just expended a ton of energy to cut a 6 goal lead to 2 is probably less likely to win than a team that was up 1 and just gave up a quick three goal run to go down one. Knowing the plays that a team executed to get to where they are now could really help to quantify that disparity. An analysis for another day though.

Notes:

As of November 2016, our model contains 888 NCAA DI Men’s lacrosse games from the 2015 and 2016 season. Play values are calculated by counting the number of times that each play occurs, then counting the number of goals that are scored within the next 60 seconds, for and against, and then dividing by the number of times each play occurred. Full data is here:

PlayType

Occurrences

GoalsFor

GoalsAgainst

Diff

GoalsFor_per

GoalsAgainst_per

Diff_per

Missed Shot

19824

5693

1431

4262

0.287

0.072

0.215

Shot Clock On

1493

449

134

315

0.301

0.090

0.211

Good Clear

25753

6026

1418

4608

0.234

0.055

0.179

Ground Ball

33438

8536

2771

5765

0.255

0.083

0.172

Pipe Shot

1951

504

172

332

0.258

0.088

0.170

Faceoff Win

20178

4259

1073

3186

0.211

0.053

0.158

Blocked Shot

3225

734

317

417

0.228

0.098

0.129

Timeout

4569

874

574

300

0.191

0.126

0.066

Saved Shot

18520

3199

2308

891

0.173

0.125

0.048

Assisted Goal

9819

1479

1115

364

0.151

0.114

0.037

Unassisted Goal

7565

1097

908

189

0.145

0.120

0.025

Goalie Change

2473

338

323

15

0.137

0.131

0.006

Substitution

256

38

40

-2

0.148

0.156

-0.008

Penalty – 0 sec

35

1

5

-4

0.029

0.143

-0.114

Unforced Turnover

7730

515

1489

-974

0.067

0.193

-0.126

Forced Turnover

7494

485

1499

-1014

0.065

0.200

-0.135

Penalty – 3 min

28

3

7

-4

0.107

0.250

-0.143

Shot Clock Violation

174

7

32

-25

0.040

0.184

-0.144

Failed Clear

4444

275

1118

-843

0.062

0.252

-0.190

Penalty – 2 min

66

7

26

-19

0.106

0.394

-0.288

Penalty – 30 sec

2443

127

967

-840

0.052

0.396

-0.344

Penalty – 1 min

3151

167

1347

-1180

0.053

0.427

-0.374

For the sake of double checking, we ran the data on the 408 2015 games vs the 480 2016 games to make sure that the values were consistent. The R2 value was .99, so yes, these stand up across years.

3 Comments

CU77
March 23, 2017 @
8:55 pm

“but both assisted and unassisted goals have a positive expected value, meaning in addition to the goal scored, the model says you should expect .03 to .04 goals in the next 60 seconds. It’s tiny, but this speaks to the value of momentum?”

Or it’s a statistical fluctuation? What’s the estimated error on this number?

What kind of error metric did you have in mind? In terms of the raw data (table also included at the bottom of the post), there were 9,819 assisted goals and 7,565 unassisted goals in the data set, so the sample size is substantial.

Let’s take the 9819 assisted goals. If I’m understanding your table correctly, the team that scored went on to score another goal in the next minute 1479 times, and the team that was scored on went on to score a goal in the next minute 1115 times. Assuming goal-scoring is a Poisson process, the standard-deviation error on each number is the square-root of each: 38.46 for 1479, and 33.39 for 1115, and these errors are independent. The difference 1479-1115=364 then has an error which is the square root of the sum of the squares of the errors on each number, which is then the square root of sum 1479+1115=2594; this is 50.93. The difference 364 is 7.15 times the standard deviation 50.93, and is therefore very unlikely to be a statistical fluctuation.

This is the sort of error analysis I wish you would do. Merely saying “the sample size is substantial” is not enough to draw any actual conclusions, you have to work through the numbers.