New England Symposium on Statistics in Sports


October 15, 2021

Atlético’s next match in the UEFA Champions League

Atlético Madrid vs. FC Liverpool

Liverpool’s previous results:

Liverpool 2:2 Manchester City

Porto 1:5 Liverpool

Brentford 3:3 Liverpool

Norwich City 0:3 Liverpool

Liverpool 3:0 Crystal Palace

\(\rightsquigarrow\) How were goals scored? Build-up play? Counter attacks?

Analysing future opponents in soccer

things to consider when analysing future opponents (among others):

  • how are goal-soring chances created? \(\color{gray}{\text{(counter attacks?)}}\)

  • style of play? \(\color{gray}{\text{(offensive?)}}\)

  • how does a team defend? \(\color{gray}{\text{(pressing?)}}\)


\(\rightsquigarrow\) such questions are usually investigated via (time-consuming) video analysis

Analysing future opponents in soccer

Here, we…

  • … make use of high-frequency tracking data

  • … aim at automatically detecting a team’s playing style

Tracking data in soccer

Tracking data in soccer

Tracking data in soccer

Tracking data in soccer

  • for the analysis of soccer tracking data, studies focus on…
    • … pass/shot performance (Kempe et al., 2018; Fairchild et al., 2018)
    • … team formation (Memmert et al., 2019)
    • … space creation (Fernandez and Bornn, 2018)
  • review paper on soccer tracking data by Goes et al. (2020)








Data

Data

  • tracking data from a single match

  • data provided by Metrica Sports
    (https://github.com/metrica-sports/sample-data)

  • \((x,y)\) coordinates of all players + the ball

  • sampling frequency: 25 Hz

  • 22 players + the ball, 90 minutes of play, 25 obs. per second
    \(\rightsquigarrow\) 3M individual data points

Data — effective playing space (EPS)

  • convex hull of all players (excluding goalkeepers)

  • referred to as ‘effective playing space’ (EPS)
    (see Frencken et al., 2011; Goes et al., 2019)

Data — effective playing space (EPS)

Data — effective playing space (EPS)

  • sampling rate of the EPS: 25 Hz

  • 145,006 observations (\(\approx\) 97 minutes of play)

  • we consider only those observations where the ball was in play

\(\rightsquigarrow\) 88,251 observations in total

Data — effective playing space (EPS)








Modelling framework

Hidden Markov models (HMMs)

  • for the EPS \(y_t\), Gamma distribution is chosen
    (continuous-valued response with values \(>\) 0)
  • Gamma distribution with mean \(\mu\) and standard deviation \(\sigma\)

HMM — univariate model

model with two states fitted separately to the data from both teams:

Team A:

  • state 1: \(\hat{\mu}_1 = 1251, \hat{\sigma}_1 = 224\)
  • state 2: \(\hat{\mu}_2 = 704, \hat{\sigma}_2 = 176\)

Team B:

  • state 1: \(\hat{\mu}_1 = 1234, \hat{\sigma}_1 = 200\)
  • state 2: \(\hat{\mu}_2 = 714, \hat{\sigma}_2 = 182\)

HMM — bivariate model

  • univariate modelling approach might neglect dependence between the two teams \(\rightsquigarrow\) bivariate model

  • bivariate obs. of the EPS: \(\mathbf{y}_t = (y_{t,\text{teamA}}, y_{t,\text{teamB}})\)

  • within-state correlation in \(\mathbf{y}_t\) is allowed by using a copula \(C\)

HMM — bivariate model

  • bivariate Gamma distribution as state-dependent distribution:

    • \(F(\mathbf{y}_t\,|\,s_t) = C\Big(F_1(y_{t,\text{teamA}}\,|\,s_t), F_2(y_{t,\text{teamB}}|s_t)\Big)\)
    • \(F_1, F_2\): c.d.f. of the Gamma distribution (with \(\mu\) and \(\sigma\))
    • \(C\): copula
  • we select the Frank copula with dependence parameter \(\theta \in {\cal R} \setminus \{ 0\}\)

    • \(\theta \approx 0\): independence
    • \(\theta < 0\): neg. dependence
    • \(\theta > 0\): pos. dependence








Model selection

Model selection

  • how many states (i.e. different tactics) are there?

  • we consider models with 2-5 states

  • computational cost:

    • two-state model: \(\approx\) 7 minutes
    • five-state model: \(\approx\) 4 hours

Model selection

# parameters AIC BIC
2 states 13 2,308,880 2,309,034
3 states 23 2,250,650 2,250,923
4 states 35 2,207,860 2,208,275
5 states 49 2,174,409 2,174,991

Model selection

\(\rightsquigarrow\) four states seem reasonable








Results for the final model

State-dependent distributions

state 1 state 2 state 3 state 4
Team A: defence Team A: pressing Team A: attacking Team A: attacking
Team B: attacking Team B: attacking Team B: pressing Team B: defence
EPS Team A \(\hat{\mu} = 651, \hat{\sigma} = 130\) \(\hat{\mu} = 851, \hat{\sigma} = 283\) \(\hat{\mu} = 1031, \hat{\sigma} = 174\) \(\hat{\mu} = 1277, \hat{\sigma} = 320\)
EPS Team B \(\hat{\mu} = 1234, \hat{\sigma} = 211\) \(\hat{\mu} = 1096, \hat{\sigma} = 341\) \(\hat{\mu} = 853, \hat{\sigma} = 154\) \(\hat{\mu} = 729, \hat{\sigma} = 250\)
dep. \(\hat{\theta} = 5.326\) \(\hat{\theta} = 12.65\) \(\hat{\theta} = 5.648\) \(\hat{\theta} = 11.41\)

Decoded states

  • initial aim: provide classification of a team’s underlying tactics

  • investigate decoded states
    \(\rightsquigarrow\) for each time point, we thus obtain an estimate on the most likely state

Decoded states

Decoded states

Decoded states

time spent in each state (in minutes):

state 1 state 2 state 3 state 4
Team A: defence Team A: pressing Team A: attacking Team A: attacking
Team B: attacking Team B: attacking Team B: pressing Team B: defence
13.10 13.11 10.86 19.09

Decoded states