5 Differences between matches

⊕ Go to: Table of contents Chapters 1 2 3 4 5

5-1 Moroney's modified Poisson distribution
5-2 Maher: teams in matches
5-3 Strength and weakness
5-4 Estimation uncertainty
5-5 Extensions

Whether a simple Poisson model describes the distributions of goals across matches reasonably well or not depends in part on whether the dispersion in the data is different from that assumed by the Poisson distribution. If this is the case one way to proceed is to assume that different matches are subject to a different distribution of uncertainty across the possible number of goals. An example of an approach under this assumption is the modified Poisson distribution by Moroney (1951). Another example is Maher (1982). He proposed to include the teams that play against each other in the model. This chapter describes these two approaches.

5-1 Moroney's modified Poisson distribution

As mentioned in Chapter 4 the Poisson distribution assumes that the expectation of a variable with a Poisson distribution is equal to its variance. By calculating the mean and the variance across the observations you can check the plausibility of this assumption.⊕ Back to top

The results of his analysis using the Poisson distribution suggested to Moroney that over-dispersion could be an issue and he modified the Poisson model accordingly. The variance of the goals between matches in Moroney's data set was 1.9 which compares with an average of 1.7 goals per match. This indicates over-dispersion — a dispersion of about twelve percent in the data over the one implied by the Poisson distribution (with mean 1.7).

To modify the Poisson distribution with mean 1.7 for the observed variation in goals he proceeded as follows. With m and s² denoting the sample mean and the variance, respectively, Moroney used parameters c and p to solve the equations ... and .... With m=1.7 and s²=1.9 he finds that c=8.5 and p=14.5 solve these two equations. This method is known as method-of-moments estimation.

Moroney's modified Poisson distribution is, in fact, a negative binomial distribution. It corresponds to assuming that matches distribute across the number of possible goals according to a negative binomal distribution with parameters p and ... Moroney found that this modification fitted the frequencies of goals in his sample better than the Poisson distribution (compare Figure 5-1 below with Figure 4-1 in Chapter 4).

The Negative Binomial distribution allows for adjusting a Poisson model to over-dispersion in the data. There is a particular characterization of the Negative Binomial distribution which shows that it allows the Poisson mean number of goals to vary from match to match.

An article in 1920 by Greenwood and Yule shows that we obtain the Negative Binomial distribution if we assume that a Gamma distribution describes the variation in the Poisson mean. Often, different ways exist to characterize a probability distribution. This is just one characterisation of the Negative Binomial distribution. See Greenwood, M. and Yule, G. U.,1920, An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents., Journal of the Royal Statistical Society, Series A., vol. 83, p255-279. Under this interpretation of the Negative Binomial distribution we can show the variation in the Poisson mean as implied by Moroney's estimates. Figure 5-2 shows this variation.

Figure 5-2 shows that the Negative Binomial distribution underlying Figure 5-1 implies a "Gamma variation" in the Poisson mean within the approximate range of 0.6 and 3.2 goals per match. Note that the Poisson distribution assumes that each match has a mean of 1.7 goals per match.

Visual comparison of Figure 4-1 and 5-1 suggests that the Negative Binomial distribution fits the match data better than the Poisson distribution. Moroney used a Pearson Chi-squared test to formally assess how well the predictions in Figures 4-1 and 5-1 compare with the observed ones in the data. This test is a goodness-of-fit test. He finds that this test indicates a better overall fit to the match data by the Negative Binomial distribution. However, he also notes that this test does not provide compelling evidence to prefer one of the two distributions over the other. Pages 103 and 261 in Moroney's book.

5-2 Maher: teams in matches

⊕ Back to top

In 1968 Benjamin and Reep applied the Negative Binomial distribution to investigate the role of skill and chance in association football. Reep, C. and B. Benjamin, 1968, Skill and chance in association football., Journal of the Royal Statistical Society. Series A (General)., vol. 131, p581-585. Charles Reep was a pioneer in recording football events for statistical analysis.One of their conclusions was that chance dominates skill in a football match.

Fourteen years later Maher (1982) noted that it may be true that chance dominates skill from match to match but throughout a season chance evens out and skill tops chance. He showed how to use the Poisson distribution to infer the relative qualities of teams during a season of association football.

He motivated his choice for the Poisson distribution as follows: See page 109 of Maher, M. J., 1982, Modelling association football scores. Statistica Neerlandica, vol. 36, no.3. We show an example of this in Section 3-5.

"There are good reasons for thinking that the number of goals scored by a team in a match is likely to be a Poisson variable: possession is an important aspect of football, and each time a team has the ball it has the opportunity to attack and score. The probability p that an attack will result in a goal is, of course, small but the number of times a team has possession during a match is very large. If p is constant and attacks are independent, the number of goals will be Binomial and in these circumstances the Poisson approximation will apply very well."

Like Moroney in 1951, Maher also models the variation in the Poisson mean number of goals between matches. However, he uses the variation between matches in the teams that play against each other to do this. He assumes that in a match between home team i and visiting team j, the uncertainty about the possible number of home goals is Poisson distributed with a mean equal to ...

Similarly, he assumed that the uncertainty about the possible number of goals by the visiting team is Poisson distributed with a mean equal to ... It is easier to estimate these models in logarithmic form. That is, models with the natural logarithm of the Poisson mean equal to ... and ... for goals by the home team and visiting team, respectively. This way they are formulated as Generalized Linear Models and statistical software usually has a module available for estimating them. Compared to Maher's model this reduces the variance in the predicted number of goals somewhat.

Like the Negative Binomial distribution, this approach allows each match to have a different distribution of uncertainty across the possible goals in matches. Where the Negative Binomial distribution only allows addressing dispersion in the data over the simple Poisson model, this approach also allows to address under-dispersion. There is, however, another feature of this approach which makes it interesting. We can interpret the estimates of Maher's model as differences in strengths between teams.

5-3 Strength and weakness

Maher estimated both models with the Maximum Likelihood method. This is an estimation method which provides the most likely parameters yielding the data.⊕ Back to top

Table 5-1 shows our replication of the results of Maher's model. We ordered the teams according to their positions at the end of the 1971/1972 season of the English First Division.

		Home	Away	Home	Away
		Attack	Defence	Defence	Attack
		...	...	...	...
1	Derby County	1.62	0.89	0.50	1.24
2	Leeds United	2.02	0.82	0.49	0.91
3	Liverpool	1.78	0.54	0.78	0.78
4	Manchester City	1.82	1.17	0.75	1.40
5	Arsenal	1.36	1.03	0.64	1.06
6	Tottenham Hotspur	1.71	1.12	0.63	0.87
7	Chelsea	1.55	1.12	0.97	0.83
8	Manchester United	1.49	1.35	1.31	1.49
9	Wolverhampton Wanderers	1.34	1.30	1.15	1.48
10	Sheffield United	1.49	1.31	1.28	1.09
11	Newcastle United	1.14	1.29	0.88	0.93
12	Leicester City	0.69	1.31	0.54	1.10
13	Ipswich Town	0.72	1.27	0.93	0.98
14	West Ham United	1.18	1.22	0.92	0.78
15	Everton	1.06	1.17	0.81	0.44
16	West Bromwich Albion	0.84	1.16	1.13	0.99
17	Stoke City	0.99	1.17	1.20	0.64
18	Coventry City	1.05	1.66	1.12	0.84
19	Southampton	1.21	1.98	1.38	1.05
20	Crystal Palace	0.99	1.28	1.49	0.65
21	Nottingham Forest	0.98	1.96	1.43	1.10
22	Huddersfield Town	0.46	1.37	1.06	0.74

Table 5-1: Reordered replication of Table 1 in Maher (1982)

In Table 5-1, the "attack" columns ... and ... show the relative qualities of teams in attack at home and away, respectively. Similarly, the "defence" columns ... and ... show the relative weaknesses in defence at home and away, repectively.

Some background on this season helps. Derby County won the championship earning 58 points with Leeds United, Liverpool and Manchester finishing closely at a second, third and fourth place, each with 57 points. The ranking of teams at the end of this season is available at Wikipedia. The variation in Table 5-1's estimates for these four teams suggest that they earned almost exactly the same amounts of points in different ways.

For example, Derby County appears with an estimated attack parameter of 1.62. Only Leeds United appears with a higher estimated home attack parameter equal to 2.02. The differences between the two teams in defence—at home or away—are not substantial. The last column in Table 5-1 shows an estimated strength in attack for Derby County in away matches that is about 35% higher than that of Leeds United.

5-4 Estimation Uncertainty

Table 5-1's values for the teams' strengths and weaknesses in attack, respectively, defence are estimates. As they are estimates we cannot be sure about their values—they are subject to estimation uncertainty. ⊕ Back to top

A confidence interval informs us about this uncertainty. For example, a 50% confidence interval indicates a range of values for which the true strength of a team lies as likely inside this range as it lies outside. Similarly, a 95% confidence interval indicates a range of values for which the true strength of a team is 19 times more likely to be inside this range than outside. The width of a 95% confidence interval is about three times the width of a 50% confidence interval for a variable with possible values that distribute according to the normal distribution.

Figure 5-1 shows the estimates for the teams' strength in attack and their corresponding 95% confidence intervals. We ranked the teams according to their positions in the league table at the end of the season. The dots in Figure 5-1's intervals correspond to the "alpha" values in the first column of Table 5-1. A technical note on how we obtained these confidence intervals. The number of parameters may be too high relative to the number of observations in the data for the estimates to be distributed under the normal distribution (they do asymptotically but we do not seem to have a large enough number of observations for that to hold). For this reason, we used the bootstrap percentile method for calculating these 95% confidence intervals. We created two thousands of data sets by random sampling of observations from the data set with replacement. For each data set we tried to estimate the model but this was not possible for each simulated data set (note that sampling with replacement may result in data sets with only zero-goal-matches for a team that is weak in attack or good in defence). The estimation algorithm converges to the Maximum Likelihood estimates for 1986 and 1259 simulated datasets for the models of goals by the home team and the visiting team, respectively. The widths of these confidence intervals are roughly ten percent smaller than those obtained assuming that the estimates are normally distributed. In most cases, the smaller width is due to the higher values for the lower-bounds of the bootstrapped confidence intervals.

Figure 5-1 shows that the relationship between the positions of teams at the end the season and the estimated strengths in attack is not monotonic. The intervals or horizontal lines in Figure 5-1 do not show regular steps that go up from the left bottom to the top right. Leeds United appears stronger in attack when playing at home compared to Derby County.

Figure 5-2 below shows the relative differences between teams in defence when playing their match away from home. The dots in Figure 5-2's intervals correspond to the "beta" values in the second column of Table 5-1.

The teams that ended at positions 4 to 17 do not seem to have differentiated themselves from each other in terms of defence in their opponent's stadium. However, Figure 5-3 below shows differences between them in the estimated defence weaknesses for matches in their own stadium. The dots in Figure 5-3's intervals correspond to the "gamma" values in the third column of Table 5-1.

The variation between teams in estimated qualities is largest for the estimates of the teams' strengths in attack in away matches. Figure 5-4 below shows the relative differences between teams in attack when playing their match away from home. The dots in Figure 5-4's intervals correspond to the "delta" values in the last column of Table 5-1.

Overall, Figures 5-1 to 5-4 provide an overview of the relative qualities in attack and defence between teams throughout a season. After investigating the predictive properties of the model, Maher found that a simplified version of this model performs better according to the Akaike Information Criterion. The Akaike Information Criterion is a measure of a model's goodness-of-fit to the data. A model with values closer to zero for this criterion would have better out-of-sample prediction performance. This criterion can be used for identifying a model to predict the number of goals in future matches.

The simplified version of Maher assumes the same qualities of teams in home matches and away matches except for a possible home advantage common to all teams. While having better predictive properties, this version cannot detect the differences in the variation between teams between Figure 5-1 and Figure 5-4 and between Figure 5-2 and Figure 5-3, respectively.

5-5 Extensions

We introduced Maher's model bringing attention to its potential to evaluate the relative differences in qualities between teams. Maher's model has been extended in different directions—usually with the aim of improving its predictive properties. We mention a few of them below. ⊕ Back to top

The results in the previous section rely on the assumption that the number of goals that home teams score is statistically independent of the number of goals which visiting teams score. This may work reasonably well for the purpose of evaluating differences between teams but less so for predicting the results in (future) matches.

Alternatives exist if you want to evaluate the strength of this assumption in your work. Karlis, D. and I. Ntzoufras, 2003, Modelling the dependence of goals scored by opposing teams in international soccer matches, The Statistician, vol. 52, no.3, p381-393; McHale, I. and P. Scarf, 2011, Statistical Modelling, vol 11, no.3, p219–236; Dixon, M. J. and S. G. Coles, 1997, Modelling Association Football Scores and Inefficiencies in the Football Betting Market Applied Statistics vol. 46, no.2, p265-280; Dixon, M. and M. Robinson, 1998, A birth process model for association football matches, Journal of the Royal Statistical Society: Series D, vol. 47, no.3, p523-538; Maher himself suggested using a bivariate Poisson distribution to improve the model's predictive ability. Karlis and Ntzoufras (2003) show how to do this. Instead, McHale and Scarf (2011) propose a copula approach to allow for statistical dependence between the number of goals by home teams and the number of goals by visiting teams.

Dixon and Coles (1997) show a model which allows for the statistical dependence only in matches where teams score at most one goal. Dixon and Robinson (1998) build forth on this model using information on when teams score goals during the match.

An exception to using Maher's model for prediction is Lee (1997). He shows how you can use it for simulating the ranking of teams at the end of the season. Lee, A. J., 1997, Modeling scores in the Premier League: is Manchester United really the best? Chance, vol. 10, no.1, p15-19.