Three-point shooting, Steph Curry, and coming up with stories. If you feel like doing your own analysis to investigate hypotheses or discover insights at any level, RDF graph's got your back. Case in point: The NBA.

The NBA has come into organizing analytics hackathons, asking participants to propose novel ideas in terms of both the game itself as well as its business side. Projecting the impact of hypothetical rule changes or predicting the entertainment value of games are some examples of ideas investigated in this context.

You don't have to be the NBA, or professional media, or a sports organization with a dedicated analytics team to do some analysis of your own. But some problems that are hard to tackle at any level may be approachable via flexible graph data modeling.

Does data-driven optimization -- and three-pointers -- help the game?

Basketball has traditionally been somewhat different in the east versus the west. This year's Western conference finals featured two of the most representative NBA teams in terms of the evolution of the game these days: Golden State Warriors and Houston Rockets.

It was the Warriors that made it to the finals, same as the last three years. Both teams' offense, however, seems to be focused on pursuing three-point shots and layups. The reason, it has been argued, is because analytics show this to be the most effective type of shooting.

The Warriors proceeded to the NBA finals to face the Cleveland Cavaliers for the fourth time in a row, which should say something about the effectiveness of their game. Effective or not, however, some people like it and others don't.

While there's a lot of subjectivity in this discussion, it may serve as a case study to check whether perceptions correspond to reality as can be seen through the lens of data. It can also help highlight some of the fine print when working with evolving and cross-cutting datasets.

Three-pointers are a key element of the game of the Warriors and Steph Curry. It's a conscious decision driven by analytics. (Image: ESPN)

The Warriors are considered a passing team. But an analysis using offense duration as a proxy for passing shows that the average time of their plays ending with a made two-point shot is almost a standard deviation (σ) shorter than average. The average time for made three-pointers is more than 1σ below average.

Their match in the finals, the Cavaliers, on the other hand, are almost exactly average for both. Incidentally, the Warriors have almost identical numbers to the Philadelphia 76ers in this analysis. They have the two lowest=average seconds per three-point shot. This could point to an advantage to a style of play that favors smart passes to quickly get to a three-point shooter.

Interestingly, not only do the Warriors shoot later in the shot clock than any other team in the NBA, they also force their opponents a shorter shot clock on average. Most of teams that they don't force early shots are teams that on average take shots later in the shoot clock.

You're probably wondering, where is that coming from, and what does it all mean. So let's get to that.

Keeping track of evolving and cross-cutting data and metadata with RDF

Stellman is also a consultant and a writer, and he's just finishing his latest book, the fourth edition of Head First PMP (O'Reilly Media). He did, however, make some time to discuss, research, test analytics hypotheses, and share his analytics tools and methodology in time for the NBA finals.

Stellman has been working on NBA analytics as a side project. As it turns out, however, his approach can help deal with issues common in professional sports leagues and beyond. Stellman wanted to use NBA data that's free, readily available from multiple sources, and as raw and complete as possible. His goal was to turn it into something usable for real analytics.

He says he considered using SQL or object repositories like Hadoop, but adds that having done a lot of work with RDF over the last five years, it quickly became obvious that RDF was the right choice. The reason may not be obvious for everyone, though.

Play-by-play data for the NBA is the most complete, but also hard to manage. Going from paper to a flexible way to model it helps a lot. (Image: Kristen Hewitt)

It looked like working with that data, and running the queries that he did, could just as well have been done using a relational database for example.

"That would be easy, but only if your RDBMS already has the data. One big advantage that RDF has over relational databases is that it's much easier to update the structure of the data, which is really valuable for doing hypothesis-driven analytics," said Stellman.

Stellman shared discussions he has had with members of the Minnesota Timberwolves analytics team over the last couple of summers:

"Many NBA teams have been trying to crack the play-by-play nut for a while. The problem with play-by-plays is that they contain all of the raw data, but the structure is difficult to work with.

Suppose I was using an RDBMS, and wanted to do an analysis for which I ran into a problem: If I'm not keeping track of the number of seconds for each play-by-play line, I'd have to go and modify the table that stores the plays.

Sure, it makes sense to add this for each individual piece of new data. But as you add more and more data, you keep having to update tables. You either end up with a huge number of tiny, denormalized tables, or a really wide, sparse table. You need to store who assisted, different types of shots, a ton of metadata. RDF is built for metadata."

From a side project to the NBA

Stellman refers to the time he spent on a consulting job working with Dean Allemang, RDF expert and author of The Semantic Web for the Working Ontologist.

Allemang compared adding RDF data to using slide transparencies that you might see in a university class: "With RDF, you can overlay new data over the old data, and the existing data is not affected at all."

So, to add his data, all Stellman had to do was add extra triples for each play. He updated his ontology for working with RDF data to add some metadata about the new prefixes, but didn't have to do any modeling at all: "For a small change like this, it doesn't make too much of a difference, but for a large change it saves a HUGE amount of modeling headache," he said.

Flexible data modeling helps a lot when dealing with evolving and cross-cutting data and metadata such as the ones in NBA analytics. (Image: Derivative based on original from Michael Myers)

Another reason RDF lends itself so well to analytics, according to Stellman, is because statistics is based in discrete math:

"At its root, it's basically counting. Almost all common stats are just count of one thing as a percentage of count of another thing. Even really complex formulas boil down to counts; specifically, finding the right subset of things to count. RDF is really useful for this. Everything in a play-by-play is an event, so the key is to attach the right metadata to each event."

Stellman refers, for example, to NBA teams trying to figure out how to use the data from the overhead cameras that track player movements. He mentions different ways to do this: Tracking individual movement coordinates, tracking passes to players, creating metadata tags based on machine learning, etc.

"If you wanted to attach that data to a play, you'd have to do a bunch of RDBMS modeling, and your database diagrams would start getting huge and unmanageable. But with RDF, you could create a different context for each of those kinds of data. They wouldn't have to know about each other.

And you can start by generating RDF triples, which is a lot more satisfying and productive than starting by trying to create a table model. So, you could do one analysis of individual plays and create triples with the play IRI as the subject.

Then, you add triples to your triplestore in their own context, so you can query them but also isolate them. All without having to touch any of the existing data, or do any modeling at all. It's really convenient," Stellman said.

Coming up with stories, and Occam's razor

Stellman and myself worked on a number of hypotheses, trying to come up with interesting findings and plausible explanations.

But, as Stellman noted, the standard deviation after a player on the other team makes a shot is more than twice as high as standard three-point percentage standard deviation. He pointed out that this means this metric is extremely player-specific:

"Some players have a huge 'anything you can do, I can do better' motivation, and it shows up in their stats. So, if a player on the other team just made a three, you definitely want the ball in the hands of Karl-Anthony Towns or Kevin Durant."

And then, there's Steph Curry. What does it mean that Steph Curry actually has a lower three-point percentage after a player on the other team makes a three?

Stellman came up with some explanations for why this is not necessarily a bad thing. He suggested Curry is a team player, so maybe he knows that his team gets especially fired up when a three is answered with a three.

Or maybe there's a chance that the momentum can shift, and a three goes unanswered, so it's worth taking the shot. This reminded me another analysis I've seen on Steph Curry. That one was about his comeback from a bad streak.

Coming up with stories to explain observations is part of what people do. But that does not necessarily mean the stories correspond to reality.

Colson's point is that, despite being imaginative, those stories do not necessarily hold true. Colson's explanation was that this was a statistical anomaly that was soon restored. Curry's explanation was that he just kept shooting -- that's all.

And that's something to keep in mind when coming up with stories or doing analytics. Choosing tools and coming up with plausible explanations is important, but not more than keeping things simple and sticking to the facts.

There is, after all, a theory for that too. It's called Occam's razor, and it says that the simplest possible explanation is usually true. Sometimes, it's just about keeping shooting.

Thank You

By registering you become a member of the CBS Interactive family of sites and you have read and agree to the Terms of Use, Privacy Policy and Video Services Policy. You agree to receive updates, alerts and promotions from CBS and that CBS may share information about you with our marketing partners so that they may contact you by email or otherwise about their products or services.
You will also receive a complimentary subscription to the ZDNet's Tech Update Today and ZDNet Announcement newsletters. You may unsubscribe from these newsletters at any time.