Data Sharing: Fact and Fiction

New for 2018: Data Sharing

NASCAR announced (in a somewhat roundabout way) they would share the data collected by each car during races, qualifying and practice with all teams in the 2018 season. This produced a mixed reaction in the garage. Newer drivers and those driving for smaller teams seemed to like the idea while more experienced drivers didn’t seem so happy.

“I’ve spent 13 years in this sport to figure out how to drive a racecar, make it go fast, do the things I do to win races and championships… Now you’re going to hand all that on a piece of paper to a young driver, they’re going to figure it out, as long as they know how to read it.”

“Any resource that you can have at this level, no matter what it is or how small, you have to be so perfect at everything at the Cup level that anything that we can get our hands on is going to benefit us, for sure,”

While there were a surfeit of driver opinions in the press, there was much less information about what data was being shared, why the decision was made to share it, and what the impact was likely to be.

What’s Being Shared and Why?

Let’s start with why.

NASCAR has long made information about the car available to their television partners in real time. Back in 2012, folks from SportVision (the company that collected the data for TV and Raceview),about whether teams could intercept that data. They said that they acquire such a huge amount of data that analyzing it in real time is impossible.

That was six years ago.

Raceview provides the same real-time information as television shows: steering angle, speed, throttle and brake, all keyed to the car’s on-track position. Six years ago, it might have been impossible for a team to intercept and use the data from Raceview. Today, it’s not only possible, it’s common. They call it data scraping: Intercepting data intended for display and funneling it into a database or analysis program. Wikipedia calls data scraping ‘inelegant’ and ‘ad hoc’, because you aren’t getting a nice clean stream of data, but if that’s the only way to get the data, that’s what you do.

All the larger teams have been data scraping Raceview for some time. It’s not that complicated a task, but it takes time and money — which puts smaller teams at a disadvantage.

Some combination of NASCAR and the manufacturers decided it was silly for everyone to spend large amounts of money and time scraping data. Now NASCAR provides the information directly.

What Information is Being Shared and How?

The information being shared comes in real time — during practices, qualifying and the races, but only a small subset of data is shared. The shared data comes from two places: The Engine Control Unit (ECU) and the GPS unit on the car.

Charlie Sullivan, who handles ECU management for Earnhardt-Childress Racing Engines, explained that the McLaren ECU can record about a thousand channels. A channel is one piece of data; for example, brake pressure would be one channel, throttle another channel.

Out of a thousand possible channels, NASCAR limits the teams to collecting 200 channels. NASCAR mandates what you measure with 60 channels and you can measure whatever you want on the other 140 channels. Let’s look at this graphically:

Four channels out of 200 from the ECU are being shared. Nine channels come from the GPS data and locate the car’s position, speed and velocity. The number is a little misleading: It takes three channels to locate the car in space: latitude, longitude and altitude. Position is a vector, after all.

The data being shared is important, but only a small fraction of everything the teams collect. It’s also pretty much exactly the data people were scraping off Raceview. Now everyone has access to it, and it broadens the available data because Raceview is active during qualifying or practice.

Data Precision

NASCAR teams take a lot of data, especially in the ECU. You want to measure some things frequently and some things not so frequently. Logging rate is the frequency at which measurements are made. It’s measured in Hertz (Hz), which you can read as ‘per second’.

1 Hz means once per second.

5 Hz means five times per second or once every 0.2 seconds.

50 Hz means 50 times per second or once every 0.02 seconds

500 Hz means 500 times per second or once every 0.002 second.

The simple thing to do is just measure everything at the highest possible logging rate — except the McLaren ECU only has 64MB of storage. (For specialists: this is 16-bit info. We’re not storing integers here.) Each team has to decide which data they really want. Something like temperature may be measured at 5 Hz and something like steering or brake or throttle might be measured much more frequently.

But all the data NASCAR provides to the teams is at 5 Hz, just like Raceview. This means that I can have a much more detailed trace of my own brake and throttle than the other teams are seeing.

So What’s the Impact?

There are two areas in which data sharing could affect competition: letting the drivers and teams see what other drivers are doing and in determining race strategy.

Drivers/Teams

Andy Randolph pointed out that the data sharing program means that a team now has access to IT’S OWN DATA in real time. Before this change, the team couldn’t access the data until they could physically plug a laptop into the ECU and download it.

Everything is cloud connected. During practice, someone can be analyzing the driver’s performance. When the driver pulls into the garage after a practice run, the crew chief will have a tablet with the relevant graphs all ready to show the driver. This information might be his performance, or might be a comparison between his throttle trace and another driver’s

As Charlie Sullivan pointed out, not only can people at the track access all of the information immediately, so can everyone on their network. The people at the shop can focus on analysis and forward observations to the crew chief, who doesn’t have a lot of time during practice to look over data. You’re basically bringing more brainpower to the problem by distributing different performance questions to different people. (“Sometimes we’re looking at the data while they’re still drinking champagne in Victory Lane”, he said.)

BUT…

Let’s say your competitor is beating you back to the throttle by 0.4 seconds coming out of turn 2. The driver probably already knows that, just from following the competitor around the track and it doesn’t help him to know that the differential is 0.413 seconds. He’s not waiting to get back on the throttle because he thinks it’s better strategy, or he doesn’t understand how racing works. He’s slower on the throttle because the car won’t let him be any faster.

Teams don’t have any information on other teams’ setups: suspensions, steering geometries or anything else. Before shared data, you knew the other guy was faster. The shared data let you figure out where the other guy is faster. But nothing tells you how to make your guy faster. If winning was as easy as choosing the right line, there wouldn’t be much to it.

BUT…

Josh Browne, former Crew Chief for Elliott Sadler, pointed out that you can’t find speed if you aren’t looking for it. The data will tell the team that it is possible to be faster in a particle section of the track and allow the team to focus efforts. For example, if you were 4/10 of a second slower in turns 1/2 and 2/10 of a second slower in turns 3/4, the crew chief might prioritize changes that will affect turns 1/2 because the potential gain is greater. While the data doesn’t tell you how to fix it, they do allow you to prioritize the search for speed.

Josh also points out that even without knowing setup details, you can still learn things. Steering input, for example, depends on the steering configuration, but you can compare different laps for the same driver to look for changes and you can also learn how loose or tight another car is based on how much wheel-sawing is happening.

Can’t You Already Get All This from Dartfish?

Dartfish is a video tool that allows you to overlay your car with another car. As the video plays, you can see exactly where one car has the advantage relative to the other. As Andy Randolph pointed out, this is good for comparing a small number of laps. Sometimes, you don’t have the video you need to do the comparison. Also, you’re not going to go through 500 laps of a race comparing your driver with 39 other drivers. Dartfish is still a useful tool (especially for those uncomfortable with graphs and charts), but it provides different information than the shared data.

Is It Fair?

Senior drivers’ complaints seem to center on rookies having an advantage they didn’t have. Back in the day, they had to figure out all this for themselves. (I sympathize. I walked to school uphill both ways.) But let’s look at what else has changed since Kyle Busch and Ryan Newman were rookies.

In 2003, teams could pick five 2-day tests and four 1-day tests at tracks on the Cup schedule, plus Daytona pre-season testing and unlimited testing at tracks not on the Cup schedule

In 2015, there were 12 1-day NASCAR sponsored tests and NO private testing at any track

In 2018, there are four NASCAR-sponsored tests.

I suspect if you asked today’s rookie drivers whether they would prefer being able to test on a real track, test on a simulator, or be able to see their competitor’s steering input traces, they would all say ‘test on a real track’. Just because you see what your competitor is doing doesn’t mean you can do it. So maybe the rookies ought to be pointing out that it’s not fair they’re having to learn how to race at the Cup level without getting as much seat time as the older drivers had.

Race Strategy

Back in the day, the crew chief had a much smaller number of variables to monitor. As cars have gotten more complex, the number of things the crew chief must consider has increased. Teams compile historical data like the probability of there being a caution in the last 25 laps of a race at a given track. They look at their own records: how many times did taking two tires instead of four give us a better finishing position? They subscribe to weather services so that they can make the right call when it looks like rain may wipe out a race.

They gathered data in real-time, too. They listened in on other teams’ radios to try to anticipate their pit strategy and adjust their own accordingly.

This is a lot of stuff for one person to keep straight and it’s just getting worse. In 2018, teams now have the ability to not just monitor their own driver, but to monitor every other car on the track. We have 13 channels, each at 5 Hz. That’s 65 numbers per second. Over the course of a three-hour race, we’re talking tens of thousands of pieces of information — and those pieces of information aren’t exactly in immediately usable form.

NASCAR Data Science

Data Science is hot. There is high demand for people with the skills to extract information from large quantities of complex data and output actionable information in a way people can use to quickly make decisions. The average salary for a Data Scientist is about $120,000/year.

Rho AI is a Data Science company that includes Josh Browne (aforementioned former NASCAR Crew Chief), and more MIT Ph.D.s than should be allowed in any one place that isn’t MIT. If you follow this blog, you probably also are familiar with another team member: Andrew Maness a.k.a. NASCARnomics. Autoweek had a nice article on the Rho AI crew’s work in NASCAR.

Josh knows the problems Crew Chiefs face — and which ones might be solved using Data Science. Let’s take one of the most common: How many tires do we take on the next pit stop? A train of thought might go something like this:

Note the second bullet under the second point. In order to make this decision, I need to know what every other team on track is likely to do (or at least the ones ahead of and near me.

Once you’ve decided the question you want to answer and what data you have available, you can apply the standard Data Science process. The graph below sweeps a lot under the rug in the ‘analyze’ step, but I’ll return to that.

The system constantly evolves because it evaluates every prediction against what actually happens and feeds all that back into the model so that the next time the question comes up, every bit of relevant data can be used in making the next (and hopefully more accurate) prediction.

To give you an idea of what the system they developed is capable of, they predict lap times for each car based on all the currently available information. When it’s time for a pit stop, they can provide a prediction of what position you’ll end up in over the course of the next fuel run based on taking 4,2 or no tires — and that takes into account what everyone else on track is likely to do based on historical precedent.

The systems they use belong to the same general family as the programs that Amazon and Netflix use to suggest movies or books you might like based on all the information they have about you and everyone else in the world. And if you’ve ever seen one of their recommendations and thought “what the heck…?”, you know that the predictions aren’t always on the mark. The program is only as accurate as the data and the model.

The programs rely on something called Machine Learning, which is a subset of Artificial Intelligence (AI). They key to Machine Learning is that the program uses the new information to modify its models and make itself more accurate. The more the program learns about NASCAR, the more accurate it gets. Interestingly, Josh tells me that the most challenging type of race to predict is the restrictor-plate race.

Incidentally, I met the Rho AI guys at an ARPA-E (Advanced Research Projects Agency – Energy) conference. NASCAR is far from the only thing they do. They work with companies involved in water, waste and energy to save the companies money while helping the environment. They laughed that some of the other problems they work on are easy compared to NASCAR.

Is This The Future of NASCAR?

No. It’s the present. Rho AI have been working with Richard Childress Racing for some time now to develop the system. It’ll be in action this weekend. As far as I know, there isn’t any other Data Science company working in the NASCAR space, although some teams have started developing their own programs. The Rho AI guys emphasize that their work is a true collaboration. It’s not like buying a copy of Office or Adobe Photoshop. Because the program is constantly evolving, it is literally a continual work in progress

“Decisions during a race have to be made in seconds. Our strategy tools have played a key role in our wins in 2017 and have shown the power of analytics in making real-time decisions.”

You might think a program like this would threaten crew chiefs, but the ones using it view it as one more tool they can draw upon to give them an advantage over the other teams. Steve Letarte told me once that his job as Crew Chief was not to have all the answers, but to know where to find the answers and then make decisions.

The program isn’t always right because it must face situations for which it doesn’t have enough data to make a prediction with high confidence. It must be right much more often than it’s wrong because RCR wouldn’t maintain a partnership that wasn’t working.

The crew chief doesn’t have to follow the program’s recommendation. There are things the program doesn’t consider, like whether driver A and driver B who ended up to each other on the restart had an incident in the last race or whether driver C’s contract is up for renewal and he’s in wreckers or checkers mode.

And the program isn’t going to tell them the answer to all their problems is a spring rubber and one and a half rounds of wedge in the car.