Atlético’s next match in the UEFA Champions League
Atlético Madrid vs. FC Liverpool
Atlético’s next match in the UEFA Champions League
Atlético Madrid vs. FC Liverpool
Liverpool’s previous results:
Liverpool 2:2 Manchester City
Porto 1:5 Liverpool
Brentford 3:3 Liverpool
Norwich City 0:3 Liverpool
Liverpool 3:0 Crystal Palace
\(\rightsquigarrow\) How were goals scored? Build-up play? Counter attacks?
things to consider when analysing future opponents (among others):
how are goal-soring chances created? \(\color{gray}{\text{(counter attacks?)}}\)
style of play? \(\color{gray}{\text{(offensive?)}}\)
how does a team defend? \(\color{gray}{\text{(pressing?)}}\)
\(\rightsquigarrow\) such questions are usually investigated via (time-consuming) video analysis
Here, we…
… make use of high-frequency tracking data
… aim at automatically detecting a team’s playing style
Data
tracking data from a single match
data provided by Metrica Sports
(https://github.com/metrica-sports/sample-data)
\((x,y)\) coordinates of all players + the ball
sampling frequency: 25 Hz
22 players + the ball, 90 minutes of play, 25 obs. per second
\(\rightsquigarrow\) 3M individual data points
convex hull of all players (excluding goalkeepers)
referred to as ‘effective playing space’ (EPS)
(see Frencken et al., 2011; Goes et al., 2019)
sampling rate of the EPS: 25 Hz
145,006 observations (\(\approx\) 97 minutes of play)
we consider only those observations where the ball was in play
\(\rightsquigarrow\) 88,251 observations in total
Modelling framework
model with two states fitted separately to the data from both teams:
Team A:
Team B:
univariate modelling approach might neglect dependence between the two teams \(\rightsquigarrow\) bivariate model
bivariate obs. of the EPS: \(\mathbf{y}_t = (y_{t,\text{teamA}}, y_{t,\text{teamB}})\)
within-state correlation in \(\mathbf{y}_t\) is allowed by using a copula \(C\)
bivariate Gamma distribution as state-dependent distribution:
we select the Frank copula with dependence parameter \(\theta \in {\cal R} \setminus \{ 0\}\)
Model selection
how many states (i.e. different tactics) are there?
we consider models with 2-5 states
computational cost:
# parameters | AIC | BIC | |
---|---|---|---|
2 states | 13 | 2,308,880 | 2,309,034 |
3 states | 23 | 2,250,650 | 2,250,923 |
4 states | 35 | 2,207,860 | 2,208,275 |
5 states | 49 | 2,174,409 | 2,174,991 |
\(\rightsquigarrow\) four states seem reasonable
Results for the final model
state 1 | state 2 | state 3 | state 4 | |
---|---|---|---|---|
Team A: defence | Team A: pressing | Team A: attacking | Team A: attacking | |
Team B: attacking | Team B: attacking | Team B: pressing | Team B: defence | |
EPS Team A | \(\hat{\mu} = 651, \hat{\sigma} = 130\) | \(\hat{\mu} = 851, \hat{\sigma} = 283\) | \(\hat{\mu} = 1031, \hat{\sigma} = 174\) | \(\hat{\mu} = 1277, \hat{\sigma} = 320\) |
EPS Team B | \(\hat{\mu} = 1234, \hat{\sigma} = 211\) | \(\hat{\mu} = 1096, \hat{\sigma} = 341\) | \(\hat{\mu} = 853, \hat{\sigma} = 154\) | \(\hat{\mu} = 729, \hat{\sigma} = 250\) |
dep. | \(\hat{\theta} = 5.326\) | \(\hat{\theta} = 12.65\) | \(\hat{\theta} = 5.648\) | \(\hat{\theta} = 11.41\) |
initial aim: provide classification of a team’s underlying tactics
investigate decoded states
\(\rightsquigarrow\) for each time point, we thus obtain an estimate on the most likely state
time spent in each state (in minutes):
state 1 | state 2 | state 3 | state 4 | |
---|---|---|---|---|
Team A: defence | Team A: pressing | Team A: attacking | Team A: attacking | |
Team B: attacking | Team B: attacking | Team B: pressing | Team B: defence | |
13.10 | 13.11 | 10.86 | 19.09 |