## Thursday, 22 November 2012

### Cricket and the Wasp: Shameless self promotion (Wonkish).

[UPDATE: January 2015. The post below dates from November 2012 when New Zealand's Sky TV first introduced the WASP in coverage of domestic limited overs cricket. For fans coming here as a result of its being used in the current NZ v SL series, please see here for an FAQ. For an explanation of what cricket has to do with Economics, see here; and for all the cricket posts on Offsetting Behaviour, see here.]

In their coverage of the Wellington-Auckland game in the HRV cup last Friday, Sky Sport introduced WASP—the “winning and score predictor” for use in limited-overs games, either 50-over or 20-20 format. In the first innings, the WASP gives a predicted score. In the second innings, it gives a probability of the batting team winning the match.

I am very happy about this as it is based on research by my former doctoral student, Scott Brooker, and me. Not surprisingly, the commentators didn’t go into any details about the way the predictions are calculated, so I thought I would explain the inner workings in a wonkish blog post.

The first thing to note is that the predictions are not forecasts that could be used to set TAB betting odds. Rather they are estimates about how well the average batting team would do against the average bowling team in the conditions under which the game is being played given the current state of the game. That is, the "predictions" are more a measure of how well the teams have done to that point, rather than forecasts of how well they will do from that point on. As an example, imagine that Zimbabwe were playing Australia and halfway through the second innings had done well enough to have their noses in front. WASP might give a winning probability for Zimbabwe of 55%, but, based on past performance, one would still favour Australia to win the game. That prediction, however, would be using prior information about the ability of the teams, and so is not interesting as a statement about how a specific match is unfolding. Also, the winning probabilities are rounded off to the nearest integer, so WASP will likely show a probability of winning of either 0% or 100% before the game actually finishes, even though the result is not literally certain at that point.

The models are based on a database of all non-shortened ODI and 20-20 games played between top-eight countries since late 2006 (slightly further back for 20-20 games). The first-innings model estimates the additional runs likely to be scored as a function of the number of balls and wickets remaining. The second innings model estimates the probability of winning as a function of balls and wickets remaining, runs scored to date, and the target score.

The estimates are constructed from a dynamic programme rather than just fitting curves through the data. To illustrate, in the first innings model to calculate the expected additional runs when a given number of balls and wickets remain, we could just average the additional runs scored in all matches when that situation arose. This would work fine for situations that have arisen a lot such as 1 wicket down after 10 overs, or 5 wickets down after 40 overs, etc.), but for rare situations like 5 wickets down after 10 overs or 1 wicket down after 40 it would be problematic, partly because of a lack of precision when sample sizes are small but more importantly because those rare situations will be overpopulated with games where there was a mismatch in skills between the two teams. Instead, what we do is estimate the expected runs and the probability of a wicket falling on the next ball only. Let V(b,w) be the expected additional runs for the rest of the innings when b (legitimate) balls have been bowled and w wickets have been lost, and let r(b,w) and p(b,w) be, respectively, the estimated expected runs and the probability of a wicket on the next ball in that situation. We can then write
V(b,w) =r(b,w) +p(b,w) V(b+1,w+1) +(1-p(b,w)))V(b+1,w)
Since V(b*,w)=0 where b* equals the maximum number of legitimate deliveries allowed in the innings (300 in a 50 over game), we can solve the model backwards. This means that the estimates for V(b,w) in rare situations depends only slightly on the estimated runs and probability of a wicket on that ball, and mostly on the values of V(b+1,w) and V(b+1,w+1), which will be mostly determined by thick data points. The second innings model is a bit more complicated, but uses essentially the same logic.

Now many authors have applied dynamic programming to analyse sporting events including limited overs cricket (see my previous post on this here), although I don’t know of any previous uses of such models in providing real-time information to the viewing public. Scott’s and my main contribution, however, is in including in our models an adjustment for the ease of batting conditions. I have previously blogged about our model for estimating ground conditions, here. Without that adjustment, the models would overstate the advantage or disadvantage a team would have if they made a good or bad start, respectively, since those occurrences in the data would be correlated with ground conditions that apply to both teams. Using a novel technique we have developed, we have been able to estimate ground conditions from historical games and so control for that confounding effect in our estimated models.

In the games on Sky, a judgement is made on what the average first innings score would be for the average batting team playing the average bowling team in those conditions, and the models’ predictions are normalised around this information. At this stage, I believe this judgement is just a recent historical average for that ground, but the method of determining par may evolve.

I gather that the intention is to unveil more graphics around the use of WASP throughout the season, with the system fully up and running by the time of the international matches against England. It’s going to be interesting listening to what the commentators make of the WASP. Last Friday’s game wasn’t the best showcase, since when Auckland came to bat in the second innings, their probability of winning was already at 92% and quickly rose higher. It was fun, though, hearing the commentators ask Wellington captain, Grant Elliot, who was wired for sound while fielding, what he thought their chances were given that WASP had the Auckalnd Aces at 96% at that point. Grant's reply was lovely: "Sometimes even pocket aces lose". This is worth remembering when (as will inevitably happen), a team has a probability of winning in the 90s but still goes on to lose.

#### 34 comments:

1. While watching the Wellington and Auckland game you referred to I can recall the commentators wondering if the WASP would ever show 100% (this was when Auckland were in a particularly strong postion and really couldn't lose the match). I quietly laughed at them as I figured it would be statistically impossible to show 100%, I guess I was wrong.

2. In the NZ-UK match last night, early in the second innings the win probability for NZ was zero (at least, that's what I saw on Twitter, I didn't see the probability myself) -- that seems very low, especially as NZ actually won.

The fact that the probability is based on 'average teams' seems to make this worse, since the prior information is that NZ is not as good as UK, so ignoring this prior information should make the the probability more favorable to NZ.

3. Still trying to confirm what happened last night, but our understanding is that there was a data-entry mistake at Sky. The retired hurt complicated things a bit, but I don't think that was the problem. I'll do an update post when I know more.

4. Hello, Am able to purchase the software for my own use?
Kind Regards
Jim

5. The IP is not in the public domain.

6. Each and every game depends on two phase, result and production. Player plays his games but audience plays on his/her, prediction games and enjoy a lot thank you for sharing this post..

7. What does WASP stand for?

8. The Winning and Score Predictor. It is a score predictor in the first innings and a probability of winning in the second.

9. Hello, I am a mathematician and a serious cricket follower based in Houston. This post was very interesting to me from both points of view. Do you have a article or a pre-print with more of the mathematical details about the calculation of the various probabilities involved?

10. Anando. The paper showing how we can estimate the models taking into account pitch quality using only historical data is at: http://www.econ.canterbury.ac.nz/RePEc/cbt/econwp/1144.pdf.

The actual dynamic programmes used to model the first and second innings are in the University of Canterbury Ph.D. thesis of my student, Scott Brooker. The thesis should be available online at the U of C library. We have been slow in getting it carved up into separate academic papers, unfortunately.

11. I myself developed the same app. The app does not use database but is rather based on D/L table and ODI rankings of teams which gives very similar effect and good results. Been testing it for 6 months now. Infact results are much better than WASP.

12. Watching NZ vs IND. Fall of single wicket took chances from 51% to 30%!! I myself developed the same app(never knew till today any such thing existed). Feeling awkward to say but results are much more consistent and real than WASP. The downside of my app is I am a student and had no database to use, but I overcame it using ODI rankings and points of team and results are surprisingly good :-)
Not a big mathematician, but probably biggest cricket fan.

13. The ODI match between India-New Zealand will see India winning the match comfortably due to its strong batting line up

14. Where can I find the complete algorithm? Your blog does not give a lot of details about the functioning of the algorithm. Thanks in anticipation.

15. "That is, the "predictions" are more a measure of how well the teams have done to that point, rather than forecasts of how well they will do from that point on." - well why do we need this stupid calculation for that ? If the team has scored 100-4 in 20 ov, targetting 300, it is common sense that they are not doing well. we don't need this kind of "WASP" to figure that out.. we need something more special.. we call it common sense !!

16. not so lucky today.. better lets hope for better luck next time.

17. why this pain ? kohli or dhoni or anderson has no clue about the dynamic programmes !! but they do "really" make their teams win !!

18. Isn't the structure similar to Gambler's Ruin problem? I think the formula needs some covariates to ensure it considers the batsmen still in the crease and yet to come.

19. If you are using ODI rankings, you are making a forecast rather than an assessment of performance to date, and so it is not asking the same question as WASP

20. Try http://ir.canterbury.ac.nz/handle/10092/5886

21. Anything that was both "special" and "common" at the same time would certainly be impressive!

22. Hope you can distinguish between a boolean value and an int value.
"Is team doing good?" No
"How good/bad is team doing?" ??
Another thing, implementing common sense in technology is really something worth praising. Computers can do a thing millions of time with 100% accuracy at amazing speeds, probably a human can't. Humans have common sense(mostly), machines don't.

23. I can't see the analogy with the Gambler's Ruin problem. If we wanted WASP to be a forecasting tool rather than an assessment of performance up to that point, we would certainly need to consider the quality of the batsmen not yet out. But that is not what we wanted it to do. For example, when India were 4 down and WASP was initially hovering around 10%, I would have put much higher odds on an Indian win, because of the very non-average quality of Kohli and Dhoni. Different question, different answer.

24. I thought this comment of mine didn't got published. Anyways either I am unable to understand your point or you might be thinking wrong way. ODI rankings tell a team's recent performance and capabilities (fair idea), isn't it what you are trying to get out of database? Just for clarification I just used it alongwith many other factors like required rate to current rate ratio,etc.

25. cricket is highly unpredictable...
dats y no WASP worked out. 34 to win 10 ov and 7 wickets remaining, den wot happen??? lost for 22 runs or So...

26. Hi Seamus,
Apart from being a huge cricket fan i'm also a graduate student of data mining. I was wondering have you used any mining techniques to build your prediction models..if so were can I get the details of the design etc and also some documents that you referred for constructing this.
Congrats on the success of WASP you guys have recieved.

27. Mohammed. We haven't used data mining techniques, but one certainly could, although I would trust that more to a pure statistician than a pair of economists!

28. Lol.. You are right.

29. I have spent 5 years with Cricket predictions. My algorithm is simple : Take the scorecard of the previous match.....Using those numbers, Try to predict what will happen next....it Looks exciting...very soon, i can predict the Man of the match...i am PhD frm Indian Inst of Science, Bangalore.

30. Can you share the coding logic? I am doing a college project, so need in that.

31. Keep connected with us to enjoy all cricket games live stream tv online for warm-up matches, group stage games, quarterfinals, semi finals, and the grand final. Cricket is so entertaining form to sport around the world.

http://2015iccworldcupliveausnz.blogspot.com/

32. hey , did you get the coding logic,
i am doing a similar project for my college
any updates will be appreciated

33. i am doing prediction analytics for the upcoming wc matches
i was wondering if this platform has a similar features
i ll be using decision tree analysis in rapid miner to predict the winner
also i am in need of the desired datasets for the historical data of indian team
is anybody having it or done something similar
please resond at arush.thapar@yahoo.com

34. how can i contact you