Lab: Canonical Correlation Analysis

Published

March 27, 2025

M1 MIDS/MFA/LOGOS

Université Paris Cité

Année 2024

Course Homepage

Moodle

Canonical Correlation Analysis

\[C(X,Y) = \mathbb{E}\left[X Y^\top\right]\]

\[\begin{bmatrix} C_{xx} & C_{xy} \\ C_{xy}^{\top} & C_{yy}\end{bmatrix}\]

Code
\begin{bmatrix} C_{xx} & C_{xy} \\ C_{xy}^{\top} & C_{yy}\end{bmatrix}

The first canonical components are the solution of the next problem

Optimization problem

\[\begin{array}{lll}\text{Maximize} & & u^\top C_{xy} v \\\text{subject to} & & u^\top C_{xx} uv=1 =v^\top C_{yy} v \end{array}\]

Proposition

Let

\[U \times D \times V^\top\]

be a SVD of

\[C_{xx}^{-1/2} \times C_{xy} \times C_{yy}^{-1/2}\]

The solution to the optimization problem above is

\[a = C_{xx}^{-1/2} u_1 \qquad \text{and} \qquad b= S_{yy}^{-1/2} v_1\]

where \(u_1\) and \(v_1\) are the leading left and right singular vectors of \(C_{xx}^{-1/2} \times C_{xy} \times C_{yy}^{-1/2}\), that is the first column vectors of \(U\) and \(V\).

Proof:

Proposition

A sequence of canonical components of \(C_{xy}\) can be obtained from the sequence of (extended) left and right singular vectors of \(C_{xy}\) with respect to \(C_{xx}\) and \(C_{yy}\)

Proof:

Canonical Correlation Analysis (CCA) in R

cancor() from base package R

/usr/lib/R/library/stats/help/cancor

Code
?LifeCycleSavings
Code
LifeCycleSavings |>
  slice_sample(n=5)
Code
pairs(LifeCycleSavings, 
      panel = panel.smooth,
      main = "LifeCycleSavings data")
Code
fm1 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
 
summary(fm1)
Code
pop <- LifeCycleSavings |> 
  select(starts_with('pop'))
oec <- LifeCycleSavings |> 
  select(-starts_with('pop'))
  
res.cca <- cancor(pop, oec)
Question

Check that the different components of the output of cancor() satisfy all properties they should satisfy.

Question

Design a suite of tests (using testthat) that any contender of the implementation provided by package stats should pass.

Package CCA

Applications

References