I’m going to try soon to add some normalization between years for my Elo ratings. Presently, I find start of season Elos by taking 70% of the previous season’s Elo + 30% of the end of season Elo from 2 seasons ago. I then revert this sum toward 1550 by 20%. My concern with this method is that I don’t think it’s fair to directly sum Elos from different seasons since the Elo distributions vary so greatly year to year based on the game. If we had the same game every year, this wouldn’t be a problem.

To start, I measured the average, stdev, skew, and kurtosis for the end of season Elo distributions in each year. The results are shown in this table:

The average hovers right around 1500 each year, but this is due to how I designed my Elo ratings, and doesn’t actually tell us much. Some actual measure of “true skill” would probably have higher averages in recent years, since most would agree the average robot in 2018 is much better than the average robot was in 2002.
stdevs move around each year, likely due to the game structure. 2018 had the highest stdev on record by a pretty solid margin. I have previously speculated that this could be due to the snowballing scoring structure of powerup.
The skewness is interesting, for those of you unfamiliar with skewness, a positive skewness indicates a larger positive “tail” on the distribution than the negative “tail”. Every year on record has had a positive skew, which indicates that there are always more “outlier” good teams than “outlier” bad teams. Some years have had much higher skews than others though. For example, 2015 had an incredibly positive skew, which means there were a large number of very dominant teams. 2017 in contrast had one of the smallest skews on record. This is probably due to the severely limited scoring opportunities for the strong teams after the climb and 3 rotors, as well as the fact that the teams that lacked climbing ability were a severe hindrance to their alliances. The difference in skews between 2015 and 2017 can be seen in histograms of their Elo distributions. Notice how much longer the 2015 positive tail is than the 2017 one.

I also threw in kurtosis, kurtosis is a rough measure of how “outlier-y” or “taily” a distribution is. Kurtosis tracks very closely with skew every year. This means that the “outlier” teams driving the high kurtosis in some years are “good team” outliers and not “bad team” outliers. A high kurtosis with low skew would indicate that there are lots of good team and bad team outliers. Plots of stdev vs skew and skew vs kurtosis can be seen below.

Next, I’ll be trying to normalize end of season Elos so that I can get better start of season Elos. We’ve now had two years in a row of games that have low skew/kurtosis, which means that without adjustment the 2019 start of season Elos will also have low skew/kurtosis even though the 2019 game likely will not. It’ll all come down to predictive power though, if I can get enough predictive power increase I’ll add it in, otherwise I won’t.

I’m currently working on analyzing the awesome timeseries data from TBA. I’ll have plenty more to come, but I’m at a point where I got some really sweet graphs, so I thought I’d share, and describe a rough outline of my live model at the same time.

I am currently analyzing the ~1500 matches that have the best timeseries data. It’s possible that I’ll go back later and clean up the messier data, but I wanted to focus my early analysis on data I could have high trust in. What I’m currently working on is a way to predict the match winner in real-time based on this data. Here is a Brier score graph of my current model:

The first five seconds of the match just use my pre-match Elo win probability, but from then on, I begin incorporating the real-time scoring (in conjunction with the pre-match Elo prediction) to create win probabilities. The Brier score is basically steady for the first 5 seconds (when I’m not incorporating match data) but also from ~19 to ~23 seconds, which is probably because teams have by this point scored their first set of cubes and are picking up the second set. Also, note that even at t = 150 seconds, the Brier score is not zero because the actual final score can differ from the last score shown on the screen.

I mentioned that I incorporate Elo ratings into the predictions, here is a graph showing how much weight I give to Elo versus live match data at each second:

This and the following graphs you will see were created by tuning my prediction model, so the values that you see were the most predictive ones I found. After the first 5 seconds, the importance of Elo drops sharply down to ~65% by the end of auto, where it holds roughly steady for the same 19-23 second interval described above. This makes sense, since if there isn’t much scoring, we wouldn’t expect the live scoring to increase in importance much. After that, the Elo weight decays roughly exponentially down to 0.

The general form of my model (excluding Elo) is red win probability = 1-1/(1+10^((current red winning margin)/scale)) where “scale” is how much of a lead red would need to have a 10/11 ~=91% chance of winning the match at that point in the match. Let’s call this “scale” the “big lead” amount so as not to be confused with the scale on the field. If a team is up by 40 at a point in the match where the “big lead” value is 40, that team has a 91% chance of winning, but if they are up by 80 (two big leads), that team has a 99% chance of winning. Obviously, what is considered a big lead will vary over the match, so here is a graph showing that change over time:

I excluded the first 5 seconds since the values there are indeterminate since I don’t incorporate match data then. The next few seconds of auto are also a bit weird, probably since not much happens at this time in most matches, and even in the matches that do have things happen, a “big lead” of 30+ points is not very intuitive since there is no way a team could even have this much of a lead this early (excluding penalties). By the end of auto though, we see the big lead value settle at around 20, which sounds about right, teams who are up by 20 after auto are probably feeling pretty good since they probably have control of both the switch and the scale. After auto, what is considered a big lead increases steadily until peaking at around 60 points at around 60 seconds in. This seems to make sense, because a team up by 20 after auto should be up by 60 40 seconds later if they control the scale the whole time and nothing else changes. After this, the “big lead” holds steady until 110 seconds, when it sharply drops but then recovers at ~122 seconds. I don’t know the explanation for this, but my gut tells me this has to do with climbing positioning. After that, the “big lead” drops until ending at 29 points. This means that, if you see your score on the screen at the end of the match at 30 points, you are about 90% likely to win the match in the final score, and if you are up by 60 points, you are about 99% likely to win the match.

I mentioned that the form of my model uses the red winning margin, but that’s not precisely true. In fact, I use an adjusted red winning margin where I account for ownership of the scale and of the switch. Basically, I found how much “value” to give to switch and scale ownership at each point in the match. What I mean by “value” is this, if red is down by X points but controls the scale, what is the value of X such that red and blue have an equal chance of winning the match? Here is a graph of value versus time for the scale:

Again, skipping the first 10 seconds of auto, we see scale ownership to be worth ~30 points after auto. It then drops in value to a min of 25 points at 23 seconds. This drop might be due to the initial scuffle for the scale. By 45 seconds, the scale has peaked in value at 49 points, and then it has a jittery drop until the end of the match. The same dip seen in “big lead” also appears in scale value at around 120 seconds. Interestingly, scale value does not go to 0 at the end of the match, but rather ends at 8 points. Perhaps scale ownership provides some indication of climb success?

Here is a similar graph for switch value:

Most of the same trends as in the previous graph also appear in this one. The biggest difference though is that switch value actually does go to 0 by the end of the match.

Let me know if you have any questions. I’ll have more to come soon, including win probability graphs and match “excitement” and “comeback/upset” scores.

I’ll have more to come soon, including win probability graphs and match “excitement” and “comeback/upset” scores.

I’ve just uploaded a sheet called “live_win_probabilities” which has win probabilities at every second of the match for each match in my data set. Only about 10% of matches are in this data set, I still might go back and clean up the messier ones if people are interested, but idk. I also calculated a couple of other metrics which could be used to determine how “good” a match is, here’s a summary of them:
team quality: this is just the pre-match average Elo of the competing teams
stakes: This is just a simple combination of the match type and the match number, qual matches have values of zero, and the first matches in the quarterfinals have values of 1, and the second matches of the quarterfinals have values of 2. This continues up until the highest stakes of final 3, which have values of 9. This is just a rough measurement obviously.
upset/comeback score: This is the probability that the winning alliance will win at their weakest point in the match. A value of 98% indicates the wining team had, at their worst point in the match, a 2% chance of winning.
excitement score: This looks at how much the win probability changes for an alliance over the course of a match. The units are a bit hard to grasp, but essentially a value of 1 indicates that, during the match, there was a full swing from a red win to a blue win or vice versa. So a value of 5 indicates 5 full swings of the expected match outcome.

I also added a “lookup” sheet to the workbook which can be used to generate win probability graphs. For example, here’s the win probability graph for the last comeback I mentioned, Minnesota North Star Regional qm 1 41:

I had a lot of fun looking at this data. Live win probabilities are something I’ve been dreaming about for FRC ever since I saw them for other sports. I hope to be able to analyze the 2019 data before the season ends, but no promises.

For anyone that already wants to start thinking about 2019, I’ve uploaded a book which shows the 2019 start of season Elo ratings for all teams. I’m not currently planning to change my Elo model again before next season (although note that I’m bad at evaluating myself). Results are copied here:

After a break, I’m back at it again looking at schedule strengths, I’ll have a more thorough post soon, but I thought I would just share this quick. Here is a summary of the best and worst schedules of 2018. I decided just to look at when partners and opponents were picked in order to validate my model, and I think my model’s doing a pretty good job.

I have 2096’s Hopper schedule as the worst schedule of 2018:

They had to play against 6 of the top 10 teams (top 5 captains and top 5 picks) and got to play with 0 of them. Tough luck guys.

On the flipside, we have 2220’s schedule on Archimedes as the best schedule:

They get to play with all of the top 4 robots, and don’t have to face a single one. The with/against lists are the same size though, which doesn’t sound great until you realize you play against 50% more teams than you play with.

I haven’t read everything entirely in-depth, but in a game like this years averaging scores isn’t as accurate a measurement of how good a robot is. Do you account for this in any way? To me it seems to accurately predict matches better data would be required.

I haven’t read everything entirely in-depth, but in a game like this years averaging scores isn’t as accurate a measurement of how good a robot is. Do you account for this in any way? To me it seems to accurately predict matches better data would be required.

Are you referring specifically to any of my books or just speaking generally?

I’m assuming the latter here. My general match prediction algorithm is a raw average of predicted contribution win probability and Elo win probability. Neither of these methods “average scores” like you seem to be saying. Both of them only incorporate raw match scores though, not any scouting data or detailed score breakdowns. I’m looking to make a second more advanced Elo model soon which incorporates some aspects of the published score breakdowns, that will hopefully be noticeably more predictive than my current Elo model.

Not sure I answered your question though, so let me know if you were asking something else.

I’ve uploaded a new book, called “CC Ranking Comparison.xlsx”. I’m trying something new though, instead of making a long post here, for my next few projects I’m going to be explaining them on a site I just made called https://frcstats.blog/. We’ll see how this goes, feel free to give me feedback either here or there on which format you prefer or what I could do differently in either medium to improve my content.

I’ve uploaded a new book, called “CC Ranking Comparison.xlsx”. I’m trying something new though, instead of making a long post here, for my next few projects I’m going to be explaining them on a site I just made called https://frcstats.blog/. We’ll see how this goes, feel free to give me feedback either here or there on which format you prefer or what I could do differently in either medium to improve my content.

Thanks for the shout out in the blog post! I’m glad my predictions were actually helpful for something other than placating my boredom.

I’m looking forward to see where this takes you next, and please let me know if there’s anything else of help I can do.

I’ve got another post up, this one regarding finding the best way to “measure” strength of schedule. I’ve decided to use a metric I’m calling “average rank difference” moving forward, which is found by:
Averaging the sum of all opponent ranks and subtracting out all partner ranks, and then also subtracting ((# of teams + 1)/2) and then dividing by the number of teams.

I decided to take another look at the hypothetical “serpentine valley” I investigated last year. Back then, I was more interested in if teams going into an event had an incentive to perform worse than their ability. I found that if there was a so-called serpentine valley, it was very small and centered around rank 10.

Here, I did a similar investigation using end of event rank instead of start of event Elo. A serpentine valley here might indicate that teams in matches near the end of the quals might be incentivized to get fewer RPs if they want to maximize their chances of winning the event. Here is a book with that data as well as a summary sheet: serpentine_valley_v2.xlsx (1.4 MB)

I pulled rankings and results from all events since 2008. Which gave me a sample size of about a thousand events. I found how many event wins and how many wildcards were achieved from each ranking position, and dividing those by the number of events where a team got this rank gives us a win and win/wildcard probability from each rank. Below are graphs summarizing that data:

These graphs actually look remarkably similar to the pre-event Elo graphs from my earlier work. The “serpentine valley”, if it exists at all, is centered around rank 8 or 9. For reference, the gap between rank 8 (bottom of the valley) and 10 (top of the valley) is 1.1%, jumping from 3.8% to 4.9%, or about 12 out of 1000 events. This is much smaller than I probably would have expected, and that is the lowest point compared against the highest point. Comparing rank 7 or rank 9 to rank 10 is nearly identical. The “valley” that we are seeing could feasibly be largely noise, as there are larger “valleys” at ranks 15 and 23, and I have no reasonable explanation for why there would be dips around those ranks.

My takeaway is that this methodology really doesn’t provide evidence that any reasonable number of teams are incentivized to throw matches. That doesn’t mean those incentives don’t exist, just that you’d really have to dive much deeper into the data to prove that they actually do. Maybe someday when I add alliance selections into my event simulator I’ll revisit.

So roughly rank 11 gives the worst average alliance assuming that you play in playoffs. I clipped the graphs at about rank 30 since the averages start to go crazy around there due to smaller sample sizes, but people can play around with the data as they want.

One more fun graph, this shows the most common alliance a team at each rank plays on, again assuming that they participate in playoffs:

It’s probably all noise at ranks 20ish+, lower ranks are more interesting.

Man, this is such a fun data set. Just when I think I’m done, I keep finding other cool ways to break down the data.

I tried here to take the correlation between alliance probabilities for teams at adjacent ranks. However, I shifted the data in the lower rank forward one alliance (to simulate the forward draft direction) as well as backward one alliance (to simulate the reverse, aka “serpentine”, draft direction). The correlations between these can give us a sense of the relative impacts the “forward” and “reverse” draft directions have on teams at each rank. Here is a graph showing these results:

As we would expect, the serpentine direction has negligible effect on teams ranked 1-9 in terms of affecting which alliance they will be a part of. However, the effect of the serpentine draft quickly jumps at rank 10 to be slightly less important than the forward direction. This holds true until about rank 15, at which point, both directions have roughly equal impact until about rank 19. From ranks 19-22, these teams are slightly more impacted by the serpentine draft direction than the forward draft direction.

I had to clip the data around here because it starts to get really noisy quickly at rank 23. I’m not sure if there is any real practical use for this data, but it is fascinating to see direct impacts of the serpentine draft represented in data. Relative importances of the first or second round draft for teams was such a nebulous idea to me prior to doing this, I just kind of stumbled into what I think is a reasonable way to quantify it.

One potential implication of this is that if you are rank ~17 or lower, it might be best to sell yourself to the higher seeded teams as a first round pick, and if you are at a higher rank than this, your time might be better spent trying to sell yourself as a second round pick.

Well, I’m not sure exactly how best to approach this, but here’s an excel book that will be expanded upon more in a blog post tomorrow. I’ll post a link to the blog post after it comes out. The data may look nonsensical to you until then.