GCCA¶
Generalised Canonical Correlation Analysis (GCCA) is a multiview extension of CCA that finds a shared embedding across all views. It does this by maximising the sum of pairwise correlations between the views in the shared space, while also applying regularisation to prevent overfitting.
- class polyview.embed.gcca.GCCA(*args: Any, **kwargs: Any)¶
Bases:
BaseMultiViewTransformerGeneralised Canonical Correlation Analysis (GCCA).
Finds a shared low-dimensional embedding G that maximises linear agreement across all views simultaneously (MAXVAR criterion). Works with M >= 2 views. When M = 2 this recovers classical CCA.
- Parameters:
n_components (int, default=2) – Number of shared dimensions k.
regularisation (float or list of float, default=1e-4) – Ridge regularisation added to each view’s covariance before inversion. A single float applies the same value to all views; a list gives per-view values. Larger values = stronger regularisation (useful when d_v > n or features are collinear).
output (str {"concat", "mean", "list"}, default="concat") – How to combine per-view projections in transform(): - “concat” : [Z1 | Z2 | … | ZM] shape (n, M*k) - “mean” : (Z1 + Z2 + … + ZM) / M shape (n, k) - “list” : [Z1, Z2, …, ZM] list of (n, k) arrays
centre (bool, default=True) – Subtract column means from each view before fitting.
- G_¶
Shared embedding of the training data.
- Type:
ndarray of shape (n_train, n_components)
- weights_¶
Per-view projection matrices W(v).
- Type:
list of ndarray, shape (n_features_v, n_components)
- means_¶
Per-view column means (used to centre test data).
- Type:
list of ndarray, shape (n_features_v,)
- eigenvalues_¶
Top-k eigenvalues of the aggregated smoother matrix.
- Type:
ndarray of shape (n_components,)
Examples
>>> from polyview.embed.cca import GCCA >>> gcca = GCCA(n_components=10, output="concat") >>> Z_train = gcca.fit_transform([X1, X2, X3]) >>> Z_test = gcca.transform([T1, T2, T3])
References
- Guo, C., & Wu, D. (2021). Canonical correlation analysis (CCA) based multi-view learning: An overview.
arXiv preprint arXiv:1907.01693.
- canonical_correlations() numpy.ndarray¶
Pairwise canonical correlations between all view pairs.
- Returns:
corrs[v1, v2, :] = per-component correlation between projections of view v1 and view v2.
- Return type:
ndarray of shape (n_views, n_views, n_components)
- fit(views: List[numpy.ndarray], y=None) GCCA¶
Fit the GCCA model to the training data.
- Parameters:
views (list of (n, d_v) arrays) – Training data from each view.
y (ignored)
- Returns:
self – The fitted GCCA model.
- Return type:
- transform(views: List) numpy.ndarray | List[numpy.ndarray]¶
Project views into the shared embedding space.
- Parameters:
views (list of array-like of shape (n_samples, n_features_v))
- Return type:
Depends on
outputparameter — see class docstring.