GCCA

Generalised Canonical Correlation Analysis (GCCA) is a multiview extension of CCA that finds a shared embedding across all views. It does this by maximising the sum of pairwise correlations between the views in the shared space, while also applying regularisation to prevent overfitting.

class polyview.embed.gcca.GCCA(*args: Any, **kwargs: Any)

Bases: BaseMultiViewTransformer

Generalised Canonical Correlation Analysis (GCCA).

Finds a shared low-dimensional embedding G that maximises linear agreement across all views simultaneously (MAXVAR criterion). Works with M >= 2 views. When M = 2 this recovers classical CCA.

Parameters:
  • n_components (int, default=2) – Number of shared dimensions k.

  • regularisation (float or list of float, default=1e-4) – Ridge regularisation added to each view’s covariance before inversion. A single float applies the same value to all views; a list gives per-view values. Larger values = stronger regularisation (useful when d_v > n or features are collinear).

  • output (str {"concat", "mean", "list"}, default="concat") – How to combine per-view projections in transform(): - “concat” : [Z1 | Z2 | … | ZM] shape (n, M*k) - “mean” : (Z1 + Z2 + … + ZM) / M shape (n, k) - “list” : [Z1, Z2, …, ZM] list of (n, k) arrays

  • centre (bool, default=True) – Subtract column means from each view before fitting.

G_

Shared embedding of the training data.

Type:

ndarray of shape (n_train, n_components)

weights_

Per-view projection matrices W(v).

Type:

list of ndarray, shape (n_features_v, n_components)

means_

Per-view column means (used to centre test data).

Type:

list of ndarray, shape (n_features_v,)

eigenvalues_

Top-k eigenvalues of the aggregated smoother matrix.

Type:

ndarray of shape (n_components,)

Examples

>>> from polyview.embed.cca import GCCA
>>> gcca = GCCA(n_components=10, output="concat")
>>> Z_train = gcca.fit_transform([X1, X2, X3])
>>> Z_test  = gcca.transform([T1, T2, T3])

References

  • Guo, C., & Wu, D. (2021). Canonical correlation analysis (CCA) based multi-view learning: An overview.

    arXiv preprint arXiv:1907.01693.

canonical_correlations() numpy.ndarray

Pairwise canonical correlations between all view pairs.

Returns:

corrs[v1, v2, :] = per-component correlation between projections of view v1 and view v2.

Return type:

ndarray of shape (n_views, n_views, n_components)

fit(views: List[numpy.ndarray], y=None) GCCA

Fit the GCCA model to the training data.

Parameters:
  • views (list of (n, d_v) arrays) – Training data from each view.

  • y (ignored)

Returns:

self – The fitted GCCA model.

Return type:

GCCA

transform(views: List) numpy.ndarray | List[numpy.ndarray]

Project views into the shared embedding space.

Parameters:

views (list of array-like of shape (n_samples, n_features_v))

Return type:

Depends on output parameter — see class docstring.