Innovative Applications of O.R.
What is a good result in the first leg of a two-legged football match?

https://doi.org/10.1016/j.ejor.2015.05.076Get rights and content

Highlights

  • We use data on 6975 two-legged ties in pan-European club football tournaments.

  • We estimate a probit model to assess probabilities of reaching the next round.

  • The model accounts for team strengths.

  • Estimates are for use once the result of the first leg is known.

  • What is a “good” first-leg result for a club has changed substantially over time.

Abstract

The most important pan-European football tournaments include ties where two clubs play each other over two matches and the aggregate score determines which is admitted to the next stage of the competition. A number of stakeholders may be interested in assessing the chances of progression for either of the clubs once the score of the first match (leg) is known. The paper asks what would be a “good” result for a team in the first leg. Employing data from 6,975 contests, modelling reveals that what constitutes a good result has changed substantially over time. Generally, clubs which play at home in the first leg have become more likely to convert any given first-leg result to eventual success. Taking this trend into account, and controlling for team and country strength, a probit model is presented for use in generating probability estimates for which team will progress conditional on the first-leg scoreline. Illustrative results relate to ties where two average teams play each other and to ties where a relatively weak club plays home-first against a relatively strong club. Given that away goals serve as a tie-breaker should aggregate scores be equal after the two matches, the results also quantify how great the damage is when a home-first club concedes an away goal.

Introduction

The two pan-European competitions organised by UEFA (the Champions League and the Europa League) are the most lucrative club football tournaments in the World. For the organisers, they generated commercial income, principally from the sale of television rights, of more than €1.6b in 2013–14 (www.uefa.com). Their structure has varied over time, no doubt in order to increase this commercial income further, but currently features both ‘group’ and ‘knock-out’ stages.

The group stage comprises mini leagues in which each club plays each other club at home and away to determine which two in the group will proceed to the next phase of the competition. The knock-out stages, including the rounds leading to the Final, are organised on a straight elimination basis such that pairs of clubs play each other twice (once at each of the home stadia) and the aggregate score over the two matches (‘legs’) determines which will survive to the next round. If the aggregate scores are equal, the first tie-breaker is the number of away goals scored by each club. If this still does not settle the issue, resort is made in the second-leg, first to 30 minutes extra time and then, if necessary, to a penalty shoot out.

Operational researchers have investigated a range of issues in sport (Wright, 2009, 2014) including proposing models for the evaluation of performance after a contest (Fried, Lambrinos & Tyner, 2004) and for forecasting final outcome as an event unfolds (Klaasen & Magnus, 2003). This paper touches on both these themes. It seeks means of offering guidance to interested parties once the first-leg score in a European tie has been determined, in terms of how satisfactory the result was for a club and what the prospects are of the club advancing to the next round.

A probabilistic assessment of which club will progress is likely to be useful in decision taking by a variety of stakeholders. For example, the organisers of the competition will likely wish to assign the best qualified referees to second-leg ties where the outcome of the tie is still most in the balance. Broadcasters in many markets have to choose which of several matches on the same night to show on television or assign to their principal channel; the degree of suspense remaining over which team will survive to the next round will be relevant. Supporters of the club which played at home in the first-leg must decide whether it is worthwhile to make an expensive international journey to attend the second-leg; they may wish to conserve their funds if they would be travelling for an almost lost cause. Coaches and players have to decide how much effort to place into the second-leg; here again it matters whether the second-leg is almost a ‘dead rubber’ or whether everything remains to play for (for example, a coach may rest key players to conserve their energy for upcoming domestic matches where the probability of advancement in Europe is either very high or very low). Finally, bettors and bookmakers may wish to engage in wagering on the ultimate outcome of the tie and therefore need to assess the probabilities at the mid-way point.

To all these actors, the model we propose has potential value in decision making in the time between the first- and second-legs. For coaches, the model may also have utility prior to and during the first-leg. In formulating strategy, a coach will need to think about what would be an appropriate target result. Possible strategies, ranging from very defensive to very offense-orientated, carry different degrees of risk. A coach may, for example, eschew further risk (and switch to emphasising defence) once his team has established a goal superiority which, if maintained until the end of the first match, would give a satisfactory probability of progression to the next stage of the competition. In offering a means to estimate probabilities of progression, the paper therefore serves also as a contribution to sport analytics, the application of statistics and operational research to strategy formation in sport.

We employ a data set we assembled which includes all 6975 two-legged ties played in the Champions League and the Europa League, and their predecessor competitions, from the introduction of the away goals rule (for settling ties where aggregate scores are level) in the 1960s to the end of season 2012–13. Controlling for team strength, we investigate the significance of different first-leg scores for the chances of either team progressing to the next phase. What constitutes a favourable outcome for a club will prove to have changed over time and we will therefore present illustrative probabilities based on estimation of a model using only the final sixteen years of the data period.

Section snippets

Data

We first collected data on all past results in the histories (1955–2013) of the UEFA Champions League and the Europa League and their predecessor competitions.

The source data archive was that provided by the Rec. Sport Soccer Statistics Foundation (www.rsssf.com), which has been compiled and curated by volunteers over a long period.

This data archive offers an invaluable resource; but extracting information to be used in data analysis and for generation of predictor variables involving clubs’

Measuring team strength

Our focus was on probabilistic prediction of the outcome of a tie conditional on the score in the first-leg. But, clearly, the probabilities for a given score will vary with the strength of the two clubs.

Page and Page (2007) presented an unconditional model for forecasting the outcome of a tie (i.e. to be used prior to the first-leg score becoming known), employing a large data set which had several years in common with ours but which combined contests with and without the away goals rule in

Model

Several authors (e.g. Flores, Forrest & Tena, 2012; Goddard & Asimakopoulos, 2004) have modelled the results of football matches using an ordered probit estimator, where the three possible outcomes are home win, draw and away win. In this paper, we are interested in a binary outcome and therefore use a probit model where the outcome for which probability is predicted is that the home-first club progresses to the next stage of the tournament.

The model assesses the probability conditional on the

Results

The first three columns of Table 2 display estimates for the probit model applied to each of the three periods. Estimation used the stata software package with the command ‘probit’.

Interpretation of the coefficient estimates in a probit model is not as straightforward in probit as it is in linear regression. Probit is a non-linear model. Consequently, the impact on the predicted probability of a one unit change in the value of a covariate will vary according to the current values of that and

Conclusions

We set out to provide a model which could be used by stakeholders to assess the chances of success for either club in a two-legged European football match when the result of the first-leg is already known. We expected the results also to be of inherent interest to followers of European football. Our data set encompassed all such matches in UEFA club competitions since the away goals rule was introduced in the 1960s.

It became evident that the answer to the question of what is a good first-leg

References (8)

There are more references available in the full text version of this article.

Cited by (15)

View all citing articles on Scopus
View full text