Monday, January 25, 2016

Big NYCM Update and Cutoff Prediction

I finally got the New York City Marathon data downloaded and parsed. Finally. Granted, part of the reason it took me so long was just other day-to-day and work commitments but the downloading part was a bit tricky. I didn't want to get myself flagged for issuing too many requests in rapid succession, so I put in a lot of "sleep" time in between making HTTP requests (it didn't help that the endpoint only allowed for chunks of 100 - so I had to download more than 1000 files to get both years' data).

This race is one that is on my bucket list (I went to college in The City so it has a special place in my heart). I see they have made the qualification standards less ridiculously hard, however I still feel I have a rat's chance of making them. So I need to remember to enter the lottery every year and hope I get in. At some point after however many years of lottery attempt I will probably throw in the towel and fundraise. By then, my toddler will be at least in elementary school and hopefully allow me to have a little more free time to do something challenging, like raise a ton of money for charity.

I decided since it's been a bazillion years since I posted something, I would make this a race results analysis and cut off prediction combo post.

And, folks? The data doesn't look pretty. I sort of did a double take when I ran my queries and thought to myself: "Well, this can't be right..."

But it is.

AG Group

2014 Qualifiers

2014 AG Total

Percentage

2015 Qualifiers

2015 AG Total

Percentage

F18-34

436

7775

5.61%

405

8049

5.03%

F35-39

210

3504

5.99%

201

3366

5.97%

F40-44

269

3661

7.35%

257

3499

7.34%

F45-49

298

2748

10.84%

306

2730

11.21%

F50-54

221

1905

11.60%

223

1974

11.30%

F55-59

130

931

13.96%

124

929

13.35%

F60-64

45

414

10.87%

61

417

14.63%

F65-69

12

122

9.84%

11

125

8.80%

F70-74

2

43

4.65%

5

45

11.11%

F75-79

1

9

11.11%

0

8

0.00%

F80+

1

3

33.33%

1

5

20.00%

M18-34

395

7284

5.42%

452

7265

6.22%

M35-39

255

4982

5.12%

271

4657

5.82%

M40-44

337

5992

5.62%

357

5291

6.75%

M45-49

401

4700

8.53%

390

4540

8.59%

M50-54

330

3858

8.55%

319

3752

8.50%

M55-59

176

1949

9.03%

196

2126

9.22%

M60-64

120

1071

11.20%

154

1060

14.53%

M65-69

54

430

12.56%

52

430

12.09%

M70-74

15

156

9.62%

16

148

10.81%

M75-79

3

38

7.89%

3

53

5.66%

M80+

0

10

0.00%

0

10

0.00%

Totals

3711

51585

7.19%

3804

50479

7.54%

As you can see, the qualifiers are up. With fewer finishers, we have more qualifiers. I guess the weather in 2014 with the crazy wind must have been worse than the warmer conditions of 2015.

Margin

2014

Percentage

2015

Percentage

<1 minute

229

6.17%

212

5.57%

1-2 minutes

195

5.25%

191

5.02%

2-3 minutes

194

5.23%

173

4.55%

3-4 minutes

188

5.07%

159

4.18%

4-5 minutes

168

4.53%

145

3.81%

5-10 minutes

749

20.18%

779

20.48%

10-20 minutes

1025

27.62%

1126

29.60%

20> minutes

963

25.95%

1019

26.79%

Totals

3711

3804

The Squeaker Pack looks a lot different too.

2014: 26.25%
2015: 23.13%

Qualifiers skewed more to 5+ minute margins.

Because this is a table/data heavy poast as it is, I'm going so skip the comprehensive breakdown of age group percentages. If you really want this data, let me know; I can email it to you.

The totals:

AG

2014 Qualifiers

2014 AG Total

% Qualifiers

2015 Qualifiers

2015 AG Total

% Qualifiers

F18-34

2025

27420

7.39%

1874

26440

7.09%

F35-39

909

10617

8.56%

883

10582

8.34%

F40-44

908

10082

9.01%

880

9813

8.97%

F45-49

918

6990

13.13%

877

7251

12.09%

F50-54

546

4598

11.87%

597

4874

12.25%

F55-59

314

2307

13.61%

282

2287

12.33%

F60-64

131

956

13.70%

141

1009

13.97%

F65-69

40

307

13.03%

44

331

13.29%

F70-74

8

94

8.51%

8

101

7.92%

F75-79

1

22

4.55%

1

17

5.88%

F80+

1

3

33.33%

1

6

16.67%

M18-34

1779

24926

7.14%

1654

23482

7.04%

M35-39

897

13314

6.74%

912

12726

7.17%

M40-44

1053

14336

7.35%

1017

13427

7.57%

M45-49

1225

11322

10.82%

1157

11272

10.26%

M50-54

991

8876

11.16%

901

8846

10.19%

M55-59

590

5113

11.54%

590

5296

11.14%

M60-64

373

2660

14.02%

398

2684

14.83%

M65-69

168

1088

15.44%

152

1127

13.49%

M70-74

50

380

13.16%

45

385

11.69%

M75-79

8

97

8.25%

11

108

10.19%

M80+

0

19

0.00%

2

27

7.41%

Totals

12935

145527

8.89%

12427

142091

8.75%

We still have a lower number of qualifiers and qualification percentages, though only 1.6% fewer year over year.

Margin

2014

Percentage

2015

Percentage

<1 minute

810

6.26%

705

5.67%

1-2 minutes

767

5.93%

715

5.75%

2-3 minutes

718

5.55%

628

5.05%

3-4 minutes

654

5.06%

618

4.97%

4-5 minutes

613

4.74%

508

4.09%

5-10 minutes

2677

20.70%

2584

20.79%

10-20 minutes

3456

26.72%

3452

27.78%

20> minutes

3240

25.05%

3217

25.89%

Totals

12935

12427

Squeaker pack:

2014: 27.54%
2015: 25.54%

And now, the information you probably care about most... what the the calculation predict.

Over the 10 races we've analyzed so far, the number of finishers achieving 147 seconds of margin in 2014 was 11,014.

Taking the same 10 races in 2015, sorting my margin descending, the 11,014th finisher has a margin of...

119 seconds or 1:59

All I can say is "wow." We're still not at 2:28 but, damn, this is a lot closer than I thought we'd be seeing given the reports of the warmer conditions at NYCM this past year. But I guess strong winds are worse than upper 60s/low 70s.

Hopefully the next races I have on tap don't give me the same pain in download and parse. Worst case I can go back to this one aggregator but I'd rather not if I don't have to. I don't want to get my IP blocked and have zero good options.