As usual with competition/pot sims, I used ClubElo's expected goals formula to run 10000 simulations.
First the average points in the group stage:
France 6.880 Switzerland 4.193 Romania 3.557 Albania 2.086 ------------------------- England 5.958 Russia 3.963 Slovakia 3.645 Wales 3.014 ------------------------- Germany 6.841 Ukraine 4.201 Poland 3.752 Northern Ireland 1.942 ------------------------- Spain 5.877 Croatia 3.780 Turkey 3.642 Czech Republic 3.208 ------------------------- Belgium 5.081 Italy 4.589 Republic of Ireland 3.514 Sweden 3.273 ------------------------- Portugal 5.746 Austria 4.420 Hungary 3.297 Iceland 3.067
Eliminated in the group stage:
72.37% - Northern Ireland
69.7% - Albania
50.66% - Wales
49.48% - Iceland
47.1% - Czech Republic
46.04% - Hungary
45.41% - Sweden
41.24% - Republic of Ireland
38.88% - Turkey
38.53% - Slovakia
37.78% - Romania
36.72% - Croatia
34.1% - Poland
33.65% - Russia
26.86% - Switzerland
25.84% - Austria
25.78% - Ukraine
24.43% - Italy
18.17% - Belgium
11.04% - Portugal
10.35% - Spain
9.27% - England
3.41% - France
3.19% - Germany
Eliminated in the Round of 16:
40.4% - Austria
39.73% - Switzerland
38.7% - Ukraine
37.54% - Poland
37.44% - Romania
37.11% - Slovakia
37% - Russia
36.92% - Portugal
36.57% - Italy
36.46% - Republic of Ireland
36.35% - Belgium
35.86% - Sweden
35.76% - Croatia
35.04% - Iceland
34.8% - Hungary
34.78% - Turkey
32.93% - Wales
32.68% - Czech Republic
30.44% - England
27.24% - Spain
23.14% - Albania
21.91% - Northern Ireland
21.13% - France
20.07% - Germany
Eliminated in the quarterfinals:
25.37% - England
25.16% - Portugal
23.23% - Germany
23.16% - Spain
21.74% - Belgium
21.34% - Italy
20.88% - Ukraine
20.79% - Austria
19.13% - Russia
18.85% - France
18.5% - Switzerland
17.59% - Poland
16.35% - Slovakia
15.66% - Croatia
15.25% - Romania
15.09% - Turkey
13.11% - Hungary
12.72% - Republic of Ireland
12.03% - Czech Republic
11.72% - Sweden
11.29% - Wales
11.17% - Iceland
5.57% - Albania
4.3% - Northern Ireland
Eliminated in the semifinals:
21.72% - France
19.25% - Germany
16.12% - England
14.95% - Spain
13.31% - Portugal
11.36% - Belgium
10.01% - Italy
8.37% - Ukraine
8.16% - Switzerland
8.13% - Austria
7.49% - Poland
6.77% - Turkey
6.67% - Croatia
6.49% - Russia
6.4% - Republic of Ireland
5.75% - Slovakia
5.63% - Romania
5.16% - Czech Republic
4.43% - Hungary
4.41% - Sweden
4.13% - Wales
3.08% - Iceland
1.11% - Albania
1.1% - Northern Ireland
Losing finalist:
12.89% - France
11.78% - Germany
11.1% - Spain
9.61% - England
7.57% - Portugal
6.26% - Belgium
4.66% - Switzerland
4.47% - Italy
4.2% - Ukraine
3.09% - Croatia
3.02% - Austria
2.98% - Romania
2.55% - Turkey
2.47% - Poland
2.33% - Russia
2.06% - Republic of Ireland
2.06% - Czech Republic
1.74% - Sweden
1.55% - Slovakia
1.19% - Hungary
1.03% - Iceland
0.76% - Wales
0.42% - Albania
0.21% - Northern Ireland
Winner:
22.48% - Germany
22% - France
13.2% - Spain
9.19% - England
6.12% - Belgium
6% - Portugal
3.18% - Italy
2.1% - Croatia
2.09% - Switzerland
2.07% - Ukraine
1.93% - Turkey
1.82% - Austria
1.4% - Russia
1.12% - Republic of Ireland
0.97% - Czech Republic
0.92% - Romania
0.86% - Sweden
0.81% - Poland
0.71% - Slovakia
0.43% - Hungary
0.23% - Wales
0.2% - Iceland
0.11% - Northern Ireland
0.06% - Albania
Very interesting! Thanks for sharing.
ReplyDeleteI'm not sure I agree with England being 4th favorites, as the case can be made that Italy, Belgium, and Portugal are all better teams. But of course the luck of the draw & knockout round bracket structure play a huge role in this.
I have a question that is not connected to this post, but I don't know where else to ask.
ReplyDeleteSo I was wondering if Ed, Edgar and you other guys could help me. There is a club X that for 6-7 years has had unusually bad performance against club Y in a national league. I'm talking about 0W 1D 15L type of performance. Club Y is of much better quality than club X, but X is still in the upper middle class of that league, so such a bad record is not something that would normally be expected. Clubs of lower quality than X had much better performance against Y. So I would like to measure how much this record deviates from what would've normally been expected. How can I calculate and/or simulate what percentage of points would've been normal (expected) for club X to obtain against club Y over the last 6-7 years?
Hey nogomet, do you wanna try to mathematically prove an existing match-fixing scheme in 1. HNL :D
ReplyDeleteBut serious. What you need is the expected chance that team X wins/draws/loses against team Y, given the strength of both teams at the moment the match is played. The simplest indicator for this chance is the win-expectancy calculated with the elo-ratings of teams. For NT-football you have the well known elo rating. For domestic league club football you have an equivalent ClubElo rating.
In both systems the win-expectancy for the home team is calculated using a relatively simple formula which takes into account the ratingsdifference between both teams at the time the match is played and a home field advantage factor. It results in a number between 0 and 1 which indicates the probability that the home team wins. For NT-football I've established boundaries for this win-expectancy (We) based on an extensive sample of matches in my database:
if We < 0.391 then the home team loses;
if We >= 0.391 and We <= 0.609 then it's a draw;
if We > 0.609 then the home team wins.
I use these boundaries to predict the results of scheduled NT-matches and subsequently predict future FIFA rankings.
I'm not really familiar with the clubelo system. I see that it calculates the same home team win-expectancy, although the home field advantage factor seems a bit more complicated to calculate than in NT-elo. So 'all you need' is clubelo ratings of the involved teams at the time the matches you are interested in, are played. Then you can determine the expected result of each match and compare that with the realised outcome of the match. With a sample of league matches over a substantial period of time, you should be able to make some sort of sound conclusions regarding a club structurally over- or underperforming against one other club.
It is a challenging calculation exercise, but I would be very interested in your conclusions.... Of course only if you also give the names of the teams you're investigating :)
Thanks Ed. I would like to analyse all matchups in a certain league over the last 7 years and see whether some matchups stand out and substantially deviate from what would've normally been expected given the relative strengths of concerned clubs. ClubElo publishes these probabilities for each match going back many years, so these data are not a problem to gather. What is a problem for me, since I'm not that good in calculus and probabilities, is combine all these individual probabilities for each individual match into an estimated expected number of points for each league matchup over the analyzed period. I have a strong suspicion about something, and I would like to prove it mathematically and write a paper about it. But I need help in calculus. I can give you all the details over email if you're interested and maybe we can write a paper together.
DeleteAs it happens I'm sort of an expert in handling and statistically analyzing big data-sets. I would like to help you in this particular casus.
ReplyDeleteIf you like you can send an empty e-mail to Edgar (see his contactpage for his e-mail). He'll forward it to me and then I will contact you. Sorry for the work around, but I rather not give my e-mail address in public.
Sounds good.
DeleteHi,
ReplyDeleteAre you just simulating a result or actual scorelines? I've been modelling the Euro's using Poisson to determine probabilities of each result within a game and then a random number to determine the result. 10k sims. The offensive and defensive exG 'power rankings' for each game I have implied based on market odds [goal seeked so that result (not scoreline) for each game is equal to the vig free market odds]. However, the results I'm getting are quite far off what I would expect from an overall perspective (not enough wins for big favourites). Your link for the exG above doesn't work so interested to know how you are turning elo into results. For my purposes I require actual scorelines.
Thanks
Sorry - when I say overall results, I mean tournament wins. I'm happy with the win %'s for group games (agreed to market) and the knockout rounds seem reasonable.
ReplyDeleteActual scorelines. I've changed the link with a working one. The clubelo site has been updated.
ReplyDeleteThanks Ed - how have you amended those formulae for matched played on a neutral field? And have you given France the full benefits of a home field advantage?
ReplyDeleteYes, and yes, and I'm not Ed, although he is a good old chap!
DeleteHaha oh OK, sorry! How have you amended the formulae for neutral field?
DeleteTo complicate things, I will answer you :)
ReplyDeleteThe actual scorelines are simulated, based on a probability distribution for goals scored for each team in a match (see Edgar's link for an explanation). This probability distribution is dependent on the elo win expectancy for each team in the match. In this elo win expectancy a home field advantage of 100 points is incorporated (see here for an explanation of the elo ratings).
When a match is played on a neutral field the home field advantage is just not added for the home team. And yes, France enjoys in the simulations for the coming EUROs the full home field advantage factor.
Thanks a lot Ed & Edgar! I'm sure I'll be back with more questions once I've had a chance to play around with this. I'm very interested to see how it compares to my model, which, when market lines are applied, should give an indication of where the market deviates from elo. I'm also interested to see which is a better historical predictor of results but one step at a time!
ReplyDeleteThe link states that the expected number of goals for each team is:
ReplyDeleteGoals for the Home team:
if Proba < 0.5: Home Goals = 0.2 + 1.1*sqrt(Proba/0.5)
else: Home Goals = 1.69 / (1.12*sqrt(2 -Proba/0.5)+0.18)
Goals for the Away team:
if Proba < 0.8: Away goals = -0.96 + 1/(0.1+0.44*sqrt((Proba+0.1)/0.9))
else: Away goals = 0.72*sqrt((1 - Proba)/0.3)+0.3
"Proba" is the Probability (Winning Expectancy) from the Elo Formula, ranging from 0 to 1, "math.sqrt" is the square root.
I think I need the neutral field version of this formula that I need for Poisson, unless I'm missing something!
The formulas clubelo gives calculate the expected mean of goals scored by the home team (meanH) and the expected mean of goals scored by the away team in a match (meanA).
ReplyDeleteThe number of goals the home team scores is poisson distributed with Lambda equal to meanH, so the chance the home team scores 0 goals = meanH^0*EXP^(-meanH)/FAC(0), scores 1 goal = meanH^1*EXP^(-meanH)/FAC(1), scores 2 goals = meanH^2*EXP^(-meanH)/FAC(2) etc.
The same for the away team: the chance the away team scores 0 goals = meanA^0*EXP^(-meanA)/FAC(0), scores 1 goal = meanA^1*EXP^(-meanA)/FAC(1), scores 2 goals = meanA^2*EXP^(-meanA)/FAC(2) etc.
btw: ^: power, EXP: Euler's number (2,71828...) and FAC: factorial.
The expected mean is only dependent on Proba, the elo win expectancy for the home team. Now when a match is played on a neutral field there is no home team, so the elo win expectancy of the 'home' team (or the first mentioned team if you like) is then calculated without the home field advantage factor of 100 points.
Example: team1 and team2 play a match on the field of team1; Elo rating team1 = 1368; Elo rating team2 = 1537
Elo win expectancy is 0,402 (home field advantage for team1 included)
meanH = 1,184; meanA = 1,377
Probability that team1 scores
0 goals = 0,306
1 goal = 0,362
2 goals = 0,215
3 goals = 0,085
4 goals = 0,025
etc.
Probability that team2 scores
0 goals = 0,252
1 goal = 0,347
2 goals = 0,239
3 goals = 0,110
4 goals = 0,038
etc.
If you sum all probabilities that team1 scores more goals than team2 then you will find that team1 has 32,2% chance to win. Sum all probabilities that team2 scores more goals than team1 and you will see that team2 has 41,3% chance to win. There's a 26,5% chance that it will end in a draw.
So if the same match is played on a neutral field, the elo win expectancy for team1 is only 0,274 (home field advantage for team1 no longer included). meanH = 1,008; meanA = 1,657. After the same set of calculations you will find that team1 now has 23,0% chance to win, team2 has 52,6% chance to win and there's a 24,4% chance the teams tie.
Thanks Ed, I think that I understand all of that but are wenot still using different formulae to calculate the meanH and meanA even though the match is played on a neutral field?
ReplyDeleteTo use your example, let's start with a scenario where team A has HFA:
Team A has an elo of 1368+100=1468
Team B has an elo of 1537
Therefore Team A win expectancy is 40.2%, as you mention.
In order to then calculate meanH and meanA as you have above, we use two different formulae:
For the Home team: =IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18))
For the Away team: =IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3)
T4 = 40.2%.
I then create a Poisson distribution as you have, and I get the same results.
Now, if we want to calculate meanH and meanA for the same match on a neutral field:
Team A has an elo of 1368
Team B has an elo of 1537
Therefore Team A win expectancy is 27.4%, as you mention.
In order to then calculate meanH and meanA as you have above, we use two different formulae:
For the Home team: =IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18))
For the Away team: =IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3)
T4 = 27.4%.
I then create a Poisson distribution as you have, and I get the same results.
However, this methodology will yield different results on a neutral field depending on which team we classify as being the 'home' team as we are still applying a slight home field advantage to team A by using a different formula. For example, if we swap the two teams around:
Team A has an elo of 1537
Team B has an elo of 1368
Therefore Team A win expectancy is 72.6% (fine so far)
In order to then calculate meanH and meanA as you have above, we use two different formulae:
For the Home team: =IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18))
For the Away team: =IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3)
T4 = 72.6%.
This yields the following %'s:
Team A win: 54.2%
Draw: 24.3%
Team B win: 21.5%
Compared to the figures calculated with team B as the favourite on a neutral field (as you did), you can see that the win % of the favourite has further increased, from 52.6% to 54.2%.
Apologies if I have misunderstood anything.
Absolutely no need to apologize. You got a valid point here.
ReplyDeleteAs one would expect the elo win expectancies are completely opposite when changing the order of team1 and team2 (1 - 27,4% =) 72,6%, still the clubelo formula favours the 'home'/first mentioned team slightly with regard to mean scored goals and thus win percentages. Well researched and discovered. Thanks Anonymous !
So a warning with regard to using the clubelo formulas is appropriate: for matches on neutral ground it matters which team is mentioned first for the winpercentages of each team. And that's counter-intuitive. Effectively the clubelo mean goals formula can't be used for matches on neutral ground, only as an indication. Luckily almost all NT-matches for official competitions are played on a true home/away basis.
Time to contact the boy(s) and/or girl(s) at clubelo. Edgar, what is your point of view on this and do you have a good contact at clubelo ?
For now, I'm just using the mean of the two as follows:
ReplyDeleteTeam A: Elo 1575 Win % 23.2
Team B: Elo 1783 Win % 76.8
meanA_1:
=IF(T4<0.5,0.2+(1.1*(SQRT(T4/0.5))),1.69/(1.12*(SQRT(2-(T4/0.5)))+0.18)) = 0.949
meanA_2:
=IF(T5<0.8,-0.96+(1/(0.1+(0.44*SQRT((T5+0.1)/0.9)))),0.72*SQRT((1-T5)/0.3)+0.3) = 0.919
=average(meanA_1, meanA_2)
= 0.934
meanB_1:
=IF(T5<0.5,0.2+(1.1*(SQRT(T5/0.5))),1.69/(1.12*(SQRT(2-(T5/0.5)))+0.18)) = 1.792
meanB_2:
=IF(T4<0.8,-0.96+(1/(0.1+(0.44*SQRT((T4+0.1)/0.9)))),0.72*SQRT((1-T4)/0.3)+0.3) = 1.763
=average(meanB_1, meanB_2)
= 1.778
I'm sure this isn't accurate but it should provide a decent fix for now. In order to get an accurate formula I guess we would have to:
A. Use only results from matches played on a neutral field which severely limits sample size and probably isn't a good idea:
B. Estimate the effect of HFA (I think the clubelo guys have already done this) and effectively remove this effect from the curves
Anyway, time to run some sims!
T4 = 23.2%
DeleteT5 = 76.8%
When I contacted Lars Schiefler (owner of clubelo.com) in April 2013 about the formula, I also asked about neutral venue games. This was his answer:
ReplyDeletegood question. At the moment I take team 1 as home team and team 2 as
away team and set the home field advantage for that match to 0.
This is not optimal as the curves for home and away goals are not symmetrical.
However, there are so few neutral ground games in club football that
it does not matter too much for my purpose.
Nevertheless, I will come up with something more sound in the future
including neutral ground matches. For the moment I suggest you just
mirror the 2 curves one on another and take the average.
And that's what I've been using for neutral venue games.
Good to know they've suggested taking the same approach as me. Thanks for the update
ReplyDeleteFinally found the time to identify a formula based on national team matches using polynomial regression (least squares method). The coefficient of determination was higher than that of the clubelo formula, so I'll be using it from now on for simulations.
ReplyDeletehi edgar,
ReplyDeletewould it be able for you, to filter out only the runs, that are matching the current results and to have a new estmation on the fourthcoming of excusevely these runs?
No, Marko, sorry. I don't keep the "path" to certain simulation outcomes.
Delete