GCCA¶

Generalised Canonical Correlation Analysis (GCCA) is a multiview extension of CCA that finds a shared embedding across all views. It does this by maximising the sum of pairwise correlations between the views in the shared space, while also applying regularisation to prevent overfitting.

class polyview.embed.gcca.GCCA(*args: Any, **kwargs: Any)¶

Bases: BaseMultiViewTransformer

Generalised Canonical Correlation Analysis (GCCA).

Finds a shared low-dimensional embedding G that maximises linear agreement across all views simultaneously (MAXVAR criterion). Works with M >= 2 views. When M = 2 this recovers classical CCA.

Parameters:

n_components (int, default=2) – Number of shared dimensions k.
regularisation (float or list of float, default=1e-4) – Ridge regularisation added to each view’s covariance before inversion. A single float applies the same value to all views; a list gives per-view values. Larger values = stronger regularisation (useful when d_v > n or features are collinear).
output (str {"concat", "mean", "list"}, default="concat") – How to combine per-view projections in transform(): - “concat” : [Z1 | Z2 | … | ZM] shape (n, M*k) - “mean” : (Z1 + Z2 + … + ZM) / M shape (n, k) - “list” : [Z1, Z2, …, ZM] list of (n, k) arrays
centre (bool, default=True) – Subtract column means from each view before fitting.

G_¶

Shared embedding of the training data.

Type:: ndarray of shape (n_train, n_components)

weights_¶

Per-view projection matrices W(v).

Type:: list of ndarray, shape (n_features_v, n_components)

means_¶

Per-view column means (used to centre test data).

Type:: list of ndarray, shape (n_features_v,)

eigenvalues_¶

Top-k eigenvalues of the aggregated smoother matrix.

Type:: ndarray of shape (n_components,)

Examples

>>> from polyview.embed.cca import GCCA
>>> gcca = GCCA(n_components=10, output="concat")
>>> Z_train = gcca.fit_transform([X1, X2, X3])
>>> Z_test  = gcca.transform([T1, T2, T3])

References

Guo, C., & Wu, D. (2021). Canonical correlation analysis (CCA) based multi-view learning: An overview.
arXiv preprint arXiv:1907.01693.

canonical_correlations() → numpy.ndarray¶

Pairwise canonical correlations between all view pairs.

Returns:: corrs[v1, v2, :] = per-component correlation between projections of view v1 and view v2.
Return type:: ndarray of shape (n_views, n_views, n_components)

fit(views: List[numpy.ndarray], y=None) → GCCA¶

Fit the GCCA model to the training data.

Parameters:

views (list of (n, d_v) arrays) – Training data from each view.
y (ignored)

Returns:

self – The fitted GCCA model.

Return type:

GCCA

transform(views: List) → numpy.ndarray | List[numpy.ndarray]¶

Project views into the shared embedding space.

Parameters:: views (list of array-like of shape (n_samples, n_features_v))
Return type:: Depends on output parameter — see class docstring.