Starter Kit : Data Science for Cricket Analytics

To me, cricket is a simple game. Keep it simple and just go out and play Shane Warne

Cricket has a crazy following in the sub-continent with IPL being last valued at 5.3 billion USD. This game of bat and ball largely prevalent in Commonwealth Nations is not just interesting to watch but has an equally growing analytical use case.

The discrete nature of the game and growth of IPL, the need for analytics as an edge both for on-field performance and other ancillary services such as growing and engaging the fan base is on rage.

Given that my previous startup experience was around trying to cash on this growing niche, I wanted to recap and document my learnings about the ecosystem in general. Broadly as mentioned above, the opportunity lies across two directions :

Performance Analysis

Fan Engagement & Branding Services

The fan engagement aspect largely involves the fantasy gaming sites, IPL Teams and any other celebrity imports, primarily Bollywood and so on. Given the private nature of the data involving fan engagement, the large part of the post is about performance analysis data.

Data

Before you can play around with the data, the first question is where do you get it. Given the game’s similarity with Baseball which has a whole branch of analytics called Sabermetrics, analytics for Cricket is still very early stages.

The only open source data set available was at Cricsheet. Unfortunately, it stopped updating from July 2017 onwards. But an updated version was recently released at White Ball Analytics.

Courses

There are a couple of Sabermetrics courses online which should be able to give an idea or impetus around getting started with Cricket Analytics. How to define KPIs, think about performance analysis in general.

The first key part of being able to do any good analytics is dependent on the quality and breadth of data available. Given the early days of the space, the only free available data sets are by an Irish & English gentlemen, ironic given that India is home to IPL.

This free historical dataset is limited to ball-ball events catalogue. But based on my experience, there are a couple of paid vendors with much richer data including sensor information. Having access to a greater diversity of data should make it possible to do a broader range of analytics beyond the obvious metric.

Streaming data or live feed is used by fantasy sites to be able to run their games and update scores. This kind of service involves hitting a specified API service and updated match info ball-ball.

Tools & Resources

As can be inferred from the two courses mentioned. SQL for data storage & R for basic statistical analysis is more than enough for standalone reporting.

The typical Fantasy game has a simple platform to choose the 11 odd players and based on the points incurred, the top fantasy teams would be deemed winners and eligible for prizes.

Building any sophisticated or offbeat Fantasy game/ analytics over the streaming data had several challenges :

The ball update typically had a delay of 5 seconds which in rare cases would extend to 15 sec or more. This delay was incredibly volatile and made building a live analytical engine difficult.

The data quality in streaming services has its own challenges involving frequent errors which would later be corrected.

There are no known fan based engagement numbers streaming service providers.

Use Cases & Stakeholders

The entire idea behind carrying out this analysis is to be able to use them for some purpose. The numbers crunched can be consumed by :

Fans: Analytical reports can be a source of engaging news and alternate medium for fans to ponder on. This is something along the lines of FiveThirtyEight.

League Teams: IPL franchises and other T20 leagues are a ripe customer for such analytics. Though analytics is still prevalent, it is largely driven by video analysts who or were largely ex-cricketers with no statistical backgrounds resulting in the same old domain knowledge being circulated around.

Media/ Agencies: Fan engagement numbers and even player performance forecasts etc can be incredibly useful for advertising agencies and celebrity management firms. They can better price their associated players. Firms looking to advertise can make a more scientific assessment of their marketing spends.

Landscape & Challenges

Despite the growth in tech in recent times, the majority of stakeholders who run the show(BCCI, IPL Teams) have been very slow to adopt and less willing to bet on newer possibilities. Though, it has to be mentioned that both HotStar and Dream11 have made some serious strategic moves backed by sound technical expertise.

The fantasy & streaming service are the two primary fan endpoints with both Dream11 and HotStar going head to head in terms of their future goals.

You have Cricbuzz & Cricinfo dominating the content landscape. They have the largest volume of visits but suffer from poor engagement time and the fact that their offering has no direct monetization.

Dream11 has the numbers in terms of paying user base and very fast growing one but poor engagement numbers given the nature of their static game. Their next logical step is to go for some sort of streaming.

HotStar has the best of both worlds, official streaming partners so not only high engagement numbers but given their recent foray into fantasy, they might eat into Dream11’s pie.

Given the interesting dynamics, it looks like an open fight between Dream11 and HotStar with both Cricbuzz and Cricinfo looking like potential acquisitions.