1. Introduction

Cricket, one of the most popular sports worldwide, particularly in Asia, carries a rich history and passionate following that transcends borders. However, while fans and analysts of the sport alike often speculate about the factors that influence match outcomes, such as the toss result, home-ground advantage, or batting versus bowling strategies, there is surprisingly limited statistical evidence to validate these theories systematically. This report aims to provide deeper insights into these key questions by applying data-driven methods to cricket match outcomes in the Asia Cup.

For fans, coaches, and cricket sport analysts, this report serves as a way to bridge the gap between anecdotal beliefs and objective evidence. Understanding how factors such as winning the toss, playing at home, or focusing on specific gameplay aspects like bowling or batting contribute to match outcomes can significantly enhance strategies for players and teams alike. This report has the potential to mitigate the disparities caused by the vast differences in cricket teams’ budgets, as statistical analysis can be utilized to level the playing field and promote fair competition. Additionally, the findings in this report can offer cricket enthusiasts a new perspective on the game, equipping them with empirical evidence to fuel discussions and debates about team performance.

2. Dataset Description

The dataset for this analysis concerns the Asian Cricket Council Asian Cup, a men’s international Cricket cup between Asian nations which is intended to take place every two years (some competitions were skipped for extraordinary circumstances) and alternates in format between One Day International (ODI) and Twenty20 International (T20I) tournaments.

The specific dataset used was taken from Kaggle and can be accessed through the following link: https://www.kaggle.com/datasets/hasibalmuzdadid/asia-cup-cricket-1984-to-2022.

This dataset contains information about the Asian Cricket Council Asia Cup from its creation in 1984 until its most recent edition in 2022. Furthermore, the dataset contains information on the following variables:

This dataset contains information on the following 7 countries: India, Sri Lanka, Bangladesh, Pakistan, Afghanistan, UAE, and Hong Kong. The graph presented here shows the total wins for India, Sri Lanka, Bangladesh, Pakistan, and Afghanistan, with UAE and Hong Kong not having any sort of display due to having no recorded wins in any match in this dataset. From the data, we observe that Sri Lanka leads with the most wins at 40, closely followed by India with 39 wins. Pakistan falls behind with 31 wins, while Bangladesh outperforms Afghanistan, with 10 wins compared to Afghanistan’s 5. Hong Kong and UAE, with no wins, are not included in the graph.

This depiction of total wins across teams in the Asia Cup offers an interesting starting point for analyzing the factors that contribute to winning a cricket match. By examining the performance of these teams, we can identify patterns and inputs, such as winning the toss, home-turf advantage, and batting strength versus bowling efficiency, that may correlate with match outcomes. Teams can leverage such insights to make informed decisions, optimizing their strategies and focusing on areas where they need improvement, ultimately enhancing their chances of success in future matches.

3. Research Questions

The focus of this analysis is gaining better insight into the detemiing factors for the outcomes of Cricket matches in the Asian Cricket cup. To address this, we aim to answer the three following questions:

4. Addressing the Research Questions

4.1 Addressing: Does winning the toss make a team more likely to win the match?

The first scientific question of interest we looked to answer was the following: “Does winning or losing the initial toss of a cricket match impact the likelihood of winning or losing the match overall, and does this relationship of toss result and match result differ by team?” A toss in cricket is similar to any other sport, in which a coin is flipped and the captain who calls what side the coin lands on correctly gets to choose whether their team will bowl or bat first. The winner of the toss may have a slight advantage, given that they are able to choose very strategically, considering pitch, weather, and other factors, what their team should start off with (either batting or bowling) to optimize their chances of winning. In order to answer this question, we primarily focused on the match “Result” variable and the “Toss” result variable. As a preliminary visual assessment, we created a stacked bar plot, faceted by the team that would show the result of cricket matches given the result of the match’s toss.

This bar plot, as mentioned previously, is attempting to showcase how many times teams have actually gone on to win a match against their opponents given that they have either won or lost the toss that occurs at the beginning of the match. We can see that the relationship between a team winning a toss and them winning the match against their opponent differs greatly by team. For some teams, such as Pakistan and Afghanistan, the difference between the times that they have won a match given that they have lost the toss and the times that they have won a match given that they have won the toss is very minimal. For these teams, it appears that regardless of losing or winning the initial toss, it will not have much of an impact on the overall result of the game (winning the toss does not make these teams more likely to win their match). For some teams who have only lost their matches, such as Hong Kong and UAE, the same can be said. For them, winning the toss has clearly not helped them or made a difference in them winning their overall match. For countries like India and Bangladesh, there is a slightly greater difference between the times these teams have won their matches given that they have won the toss and the times they have won their matches given that they have lost the toss. It may suggest that, for these teams, winning the toss may put them in a slightly better position to win the overall match (it may even suggest that these teams are more successful at being strategic in their toss choices such that they are at a better position to win). Lastly, it appears that Sri Lanka has the highest difference between the times these teams have won their matches given that they have won the toss and the times they have won their matches given that they have lost the toss, with the former occurring almost double the amount as the latter (around 14 more matches). This could suggest more surely that when the Sri Lankan cricket team wins their toss, they are more likely to also win their match.

This plot is particularly informative for the kind of question we aimed to address because it very clearly shows the conditional probability of a team winning their match given the result of the match’s toss through the stacked nature of the bar plot. To make this plot even easier to read, we recorded some of the columns so that results written as “win” or “Win D/L” would just be a “Win” and results written as “Lose D/L” would just be a “Lose” so there was not an overcrowding of colors or stacks in each bar (because when it really comes down to it, there are only three possible ways a match ends: in a win, a loss, or a no result). The faceted histogram also was pretty informative because it shows that the relationship between a match’s toss result and the overall result differs by team, which can give further insight into the team’s strategic performance against other teams.

Though, we can confirm these visual results by performing a chi-squared test of independence to see if our two qualitative variables of “Toss” result and match “Result” are independent (or in other words, if they actually do not have a relationship with each other, as opposed to what we hypothesized initially).

## 
##  Pearson's Chi-squared test
## 
## data:  tabla
## X-squared = 7.056, df = 2, p-value = 0.02936

From the chi-squared test, we see that because our p-value of 0.029 is less than our p-value of 0.05, there is enough evidence to reject the null hypothesis that the two variables are independent. According to the test, the proportion of match results that are either a win or lose is not the same across all toss results (win or lose). This could confirm the fact that there is indeed some sort of relationship between a toss’s result and the match’s result; though, after graphing a mosaic plot, we can see that all cells remain unshaded, which means that the Pearson residuals are primarily between -2 to 2. We, thus, do not have any evidence to believe that the corresponding observed cell count would be unlikely under the independence hypothesis. Therefore, with this mosaic plot, we would conclude that the two variables are independent. Overall, from these two differing test results, it appears that there is not a clear cut relation between a team winning the toss of a cricket match and then winning the match overall; the tests of independence conducted could be skewed as it takes into account the entire dataset, as opposed to by team (we could see major differences in the correlation between these two variables by team, as we had visually assessed from the bar plot).

4.2 Addressing: Does playing at home give a team an advantage?

The next scientific research question of interest that this report seeks to answer is: Does playing at home give a team an advantage? Two reasons can be used to answer the same: Suppose India is playing against Pakistan, and the match is being held in a stadium in Chandigarh (located in northern India), then one would naturally expect more India fans in the stadium compared to Pakistan fans, this can boost the Indian team’s morale, hence increasing their probability of winning the match. The next reason is: Given the match is played in India, it would be easier for the Indian team to access the stadium in Chandigarh, hence they would be more familiar with the stadium conditions compared to the Pakistan team, which can give the Indian team an edge over Pakistan.

The graph used to answer this question is a bar plot of the proportion of wins for each team (Bangladesh, India, Pakistan, Sri Lanka), faceted by venue (whether the team played home or away). One thing to note though - UAE, Hong Kong, and Afghanistan have played only away games, so they have been excluded for the purpose of this analysis (we only included countries that played home and away games) This plot is particularly informative for the above research question since it compares the proportion of wins for each team conditioned on whether they played home games or away games. For this question, we only focussed on wins and losses exclusively, hence matches with the outcome as ‘No Result’ were removed from the data set (also there were only 2 matches with this outcome, hence it would have a negligible effect on the overall data set).

This graph suggests that India and Sri Lanka win more games at away stadiums than at home stadiums, and conversely, Bangladesh and Pakistan win more games at home stadiums than at away stadiums. The proportions do not add up to 1 because we are comparing home and away wins relative to total matches played!

To complement the plot, a statistical analysis called the Fisher Test was incorporated - this replaces the chi-squared test since some of the expected frequencies in the tables are less than 5. It aims to answer whether there is an association between playing venue (home/away stadium) and match result (win/loss).

## 
##  Fisher's Exact Test for Count Data
## 
## data:  table(asiacup.final.dataframe.2$Result, asiacup.final.dataframe.2$Status)
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   0.09574072 10.44487680
## sample estimates:
## odds ratio 
##          1

The test suggests that we do not have enough evidence to conclude that there exists an association between playing venue and match result, since the p-value is 1, i.e. greater than 0.05. Additionally, the odds ratio of 1 lies in the confidence interval whose bounds are (0.09574, 10.44488). In a nutshell, based on the plot and the test, we cannot conclude that playing at one’s home ground is necessarily advantageous for that team.

The second graph used to answer this question is a side-by-side boxplot, which compares each team’s batting and bowling performance in home and away games. The average batting strike rate is used to measure the batting performance while the wickets taken indicate the bowling performance.

Here are the batting performance box-plots:

Here are the bowling performance box-plots:

If we first look at the batting performance boxplots, we notice that India is the only country whose batting performance at away games is better compared to home games (the median of the average batting strike rate for India’s away games is greater than its home games). For Bangladesh, Sri Lanka, and Pakistan, their batting performance at home games are better than their away games. If we do a cross-country comparison, then India has the best batting performance for away games, while Sri Lanka has the best batting performance for home games (only marginally surpassing Bangladesh and Sri Lanka).

Moving on to the bowling performance boxplots, Bangladesh is the only country whose bowling performance at away games is better compared to home games (the median of the wickets taken by Bangladesh at away games exceeds that of its home games). For India, Sri Lanka, and Pakistan, their bowling performances at home games are better than their away games. If we do a cross-country comparison, then interestingly, the plot suggests that Sri Lanka has the best bowling performance for both away and home games.

In a nutshell, 3 teams’ batting performance is higher at home games, while also 3 teams’ bowling performance is better at home games, so it is possible that there exists an association between venue (home/away) and batting/bowling performance.

4.3 Addressing: Does batting or bowling make more of a difference in determining the match’s result?

The last question we aim to address in this report is whether batting (i.e., number of boundaries) or bowling (i.e., number of wickets) has a greater impact on match results. This last question, whose primary intent is to establish a predictive relationship and not necessarily causal, was chosen to complement the first two and provide further insight into the key factors that influence the outcome of matches in international cricket matches. More specifically, this question aims to provide key insights into how resources may be optimally allocated during a match to maximize the number of wins – i.e., should teams focus on bowling or batting to maximize their number of wins – thus supplementing the material already discussed. However, before comparing the magnitudes of the impacts of batting and bowling on winning, we conduct some exploratory analysis of their individual impacts on winning (i.e., we analyze winning in light of batting and bowling separately).

Before delving into the final question of this report, whether batting (number of boundaries) or bowling (number of wickets) has a greater impact on match results, we first examine how each country in the Asia Cup performs in terms of both batting and bowling. This will help us identify which teams are the strongest and weakest overall. We will then explore batting and bowling separately, followed by a regression analysis to determine the magnitude of impact each aspect of the game has on match outcomes.

This plot visualizes the average batting strike rate and average wickets taken for each country in the Asia Cup, where the size of the bubble and the color of the bubble (i.e, a color scale of pale turquoise to pale violet red) provide insights into each team’s performance. Larger bubbles and more pigmented violet red colors indicate a higher average strike rate and a greater number of wickets taken, respectively. India stands out as the powerhouse, excelling in both categories with the highest average batting strike rate and the most wickets taken, making them strong in both batting and bowling. Pakistan and Sri Lanka appear to be positioned similarly, with comparable strengths in both batting and bowling. Following them, Afghanistan emerges as a stronger team, with a notable edge in wickets taken compared to Bangladesh, which is positioned just behind them. Finally, UAE and Hong Kong, with fewer matches played, are the weakest teams in terms of both batting and bowling, though the impact of these teams is relatively reduced. The UAE has a slightly stronger bowling performance, while Hong Kong marginally outperforms in batting.

The impact of bowling in cricket is at times overlooked in favor of batting performances, but it plays a pivotal role in determining the outcome of matches. In this section, we will focus on understanding how bowlers contribute to the team’s success by comparing the key metrics of wickets taken versus wickets lost. By examining these factors across various teams, we aim to identify patterns that highlight the importance of bowling strategies and their potential impact on match results.

This graph depicts how taking wickets is an indicator of winning matches; in the same light, it shows how losing wickets is an indicator of losing matches. Moreover, for a team to consistently take wickets, even in matches that end as a loss, highlights the strength of a team’s bowling attack and its ability to challenge other teams in any scenario. Specifically, we can tell that India, Pakistan, and Sri Lanka still take quite a few wickets in the scenario wherein the match is lost. This plot serves to answer the following questions: Does the number of wickets taken make a team more likely to win a match? In another lens, does the number of wickets lost make a team more likely to lose a match? Having a solid understanding of the association between wickets taken, wickets lost, and match result will provide great color on the impact of bowling towards whether a team wins or loses a match.

In terms of interpreting the graph above, we note that if the match result is a win, then the number of wickets taken is greater than the number of wickets lost. Though this is seen across all countries playing, this is seen at a greater scale in India, Pakistan, and Sri Lanka. It appears that countries like India, Pakistan, and Sri Lanka leverage their bowling strength to take significant numbers of wickets, particularly in matches they win. We see for Afghanistan and Bangladesh that they both do not take a large number of wickets overall even in victories (where the number of wickets they do take are still larger than the number of wickets lost for winning matches). This could suggest a reliance on their batting depth and suggest that improvements in bowling could shift the balance further in their favor. Lastly, Afghanistan, UAE, and Hong Kong exhibit lower wicket totals overall, indicating fewer matches played or reduced impact in the tournament. Their matches tend to lack the intensity observed in encounters involving stronger teams.

Now we turn our analysis into the impacts of batting (boundaries) over the total number of wins. The graphs used to explore this relationship are two sets of density plots facetted by teams measuring the density of boundaries by match results, where one set measures this relationship for the number of Fours (a boundary worth four points, or four Runs) for a team and the other set measures it for the number of Sixes (a boundary worth six points, or six Runs). These graphs aim to provide information regarding whether boundaries have a significant impact on the number of wins for a given team, on average.

This first set of density plots, concerning the density of Sixes by match results for each team on the dataset, shows a positive trend between the number of wins and number of sixes for a given team, on average. More specifically, the plot shows that the density of Sixes for matches won is shifted to the right when compared to the density for matches lost for most teams - except those without any recorded wins on the dataset, i.e., Hong Kong and UAE. This is visually seen through the fact that the blue-shaded regions (matches won) overlay the red-shaded ones (matches lost) for higher values of sixes for most countries in the dataset, while red-shaded regions typically have peaks at lower values of Sixes. Therefore, this indicates that higher numbers of sixes have, on average, a positive relationship with winning matches, while lower values have a positive relationship with losing matches.

The graph above, similar in nature to the first one but displaying the same information for the number of Fours instead of Sixes, shows a very similar trend. More specifically, we can again observe that the density of Fours for matches won is shifted to the right in most cases when compared to the density for matches lost, with the latter exhibiting peaks at lower values of Fours - note that these relationships holds on average but not necessarily for every team on the dataset. Therefore, these plots further indicate that higher numbers of Fours have, on average, a positive relationship with winning matches, while lower values have a positive relationship with losing matches.

As a result of these two sets of density plots, we can establish that the number of boundaries is positively associated with a team’s odds of winning, once winning matches usually exhibit higher numbers of Fours and Sixes.

Now that the individual analysis of the relationship between boundaries and wickets on winning is concluded, we turn our attention to the magnitude of their respective impacts over winning outcomes. More specifically, we now analyze the magnitude of the impact of wickets on winning and the magnitude of the impact of boundaries on winning and contrast such results to gain better insight into which factor, batting or bowling, has a greater impact over match results.

To conduct this analysis, we conduct a linear regression on the total number of wins for a given team based on its average number of Fours (batting), Sixes (batting), and average Wickets Taken (bowling). We use both the average number of Fours and Sixes as explanatory variables instead of combining them into a single variable - total number of boundaries, for instance - once this preserves any granularities of the relationship between batting and winning. The model results are as follows:

## 
## Call:
## lm(formula = total_wins ~ avg_fours + avg_sixes + avg_wickets_taken, 
##     data = team_stats)
## 
## Residuals:
##        1        2        3        4        5        6        7 
## -0.89064  2.86695 -3.13695 -0.08137  0.93123 -0.66471  0.97549 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -111.9175     8.3119 -13.465 0.000886 ***
## avg_fours            9.9989     0.8099  12.346 0.001145 ** 
## avg_sixes          -11.2926     1.6432  -6.872 0.006310 ** 
## avg_wickets_taken    2.1837     1.6276   1.342 0.272236    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.653 on 3 degrees of freedom
## Multiple R-squared:  0.9892, Adjusted R-squared:  0.9783 
## F-statistic: 91.22 on 3 and 3 DF,  p-value: 0.001911

Overall, the model shows coefficients of greater magnitude for batting - that is, for Fours and Sixes - when compared to Wickets Taken, and thus indicates a stronger relationship between batting and match outcomes, when compared to bowling and match outcomes. Note that while we find evidence that the relationship between batting and match outcomes is stronger when compared to bowling, the regression model does not provide very clear results when it comes to the direction of this relationship - namely, it indicates a possibly negative relationship between the average number of Sixes and match outcomes, while showing a positive relationship between the average number of Fours and match outcomes. While this may be troubling if we attempted to utilize the regression model alone as our predictive instrument, it is not a particularly big concern for this analysis since the primary objective of the model is establishing the magnitude of the association between match results and batting and bowling.

In summary, the regression model establishes that batting has a bigger impact on match results when compared to bowling. Meanwhile, the individual analysis of batting and bowling impacts on match results established that both have, on average, a positive relationship with winning matches. Therefore, we establish that, on average, batting has a greater positive impact on match results when compared to bowling.

5. Conclusion:

To summarize the research questions and results:

The first question aimed to examine whether winning a toss is associated with a higher likelihood of the team winning the match - for Afghanistan and Pakistan, we concluded that the result of the coin toss has little impact on the outcome of the match. The same inference can be made regarding Hong Kong and UAE, who have lost all their matches. However, for teams like India and Bangladesh, the coin toss has played a slightly more decisive role in their victories. For Sri Lanka however, winning the toss is associated with an even higher chance of winning the game. The chi-squared test and mosaic plot interestingly gave contrasting results, making it hard to conclude that winning a coin toss is associated with winning the match.

The second question tried to determine if playing at one’s home ground would give that team an advantage. We saw that Bangladesh and Pakistan win more home games than away games, whereas the opposite is true for India and Sri Lanka. The Fisher Test did not give us sufficient evidence to claim that playing at the home stadium necessarily gives that team an advantage.

We also noticed that in terms of batting and bowling performance, a majority of the countries (3 out of 4) performed better at home games compared to away games. As a result, there is strong indication of the existence of a relationship between playing at home and batting/bowling performance.

The third question aimed to explore whether batting or bowling significantly impacts the result of a match. India is the most-well rounded team, evidenced from its batting and bowling proficiency. India, Sri Lanka and Pakistan took many wickets in their victories. We observed that a higher number of fours is positively associated with winning matches, the same thing can be said about sixes, although we also incorporated wickets taken in the regression, we obtained unusual results, such as a negative relationship between sixes and match outcome. All in all, we claim that batting has a larger impact on match results than bowling.

6. Future Discussion:

In the future, we can look to expand the scope of our dataset to determine whether or not the scientific questions we answered above apply to different formats and tournaments of cricket. For example, while the Asia Cup is a reputable tournament, the most prestigious ones by far are the ODI World Cup, the T20 World Cup, and the World Test Championships (all of which represent the three different formats of the game). It would be interesting to see whether we can generalize our findings to all these formats and tournaments in order to gauge a singular, predictive model on whether a team will win a cricket match. Other than exploring the question of “if the results of the above three questions differ between ODI, Test, and T20 cricket,” we can also, with future work, determine the importance of other factors, such as pitch/weather conditions on win probabilities, as well as how significant the contribution of the “Man of the Match” player is (otherwise known as the most in-form player during the match). If this player is not on the roster, would it impact a team’s winning chances? This would then start the discussion as to whether or not cricket is actually a one-player game (dependent on the most in-form athletes) or a team game (does everyone contribute equally, and therefore there are no impacts to win probabilities based on individual players). These questions would continue to assist us in developing a more accurate cricket match result prediction model.