.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_cca.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_plot_cca.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_cca.py:

Different CCA methods
=====================

Exempliefies different CCA methods

.. GENERATED FROM PYTHON SOURCE LINES 10-11

Import necessary libraries.

.. GENERATED FROM PYTHON SOURCE LINES 11-22

.. code-block:: default


    import pandas as pd
    import numpy as np
    from numpy.linalg import svd
    from statsmodels.multivariate.cancorr import CanCorr

    from sparsecca import cca_ipls
    from sparsecca import cca_pmd
    from sparsecca import multicca_pmd
    from sparsecca import pmd


.. GENERATED FROM PYTHON SOURCE LINES 23-24

Get toy data example from seaborn

.. GENERATED FROM PYTHON SOURCE LINES 24-35

.. code-block:: default


    path = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"
    df = pd.read_csv(path)
    df = df.dropna()

    X = df[['bill_length_mm', 'bill_depth_mm']]
    Z = df[['flipper_length_mm', 'body_mass_g']]

    X = ((X - np.mean(X)) / np.std(X)).to_numpy()
    Z = ((Z - np.mean(Z)) / np.std(Z)).to_numpy()


.. GENERATED FROM PYTHON SOURCE LINES 36-37

Define function for printing weights

.. GENERATED FROM PYTHON SOURCE LINES 37-41

.. code-block:: default

    def print_weights(name, weights):
        first = weights[:, 0] / np.max(np.abs(weights[:, 0]))
        print(name + ': ' + ', '.join(['{:.3f}'.format(item) for item in first]))


.. GENERATED FROM PYTHON SOURCE LINES 42-43

First, let's try CanCorr function from statsmodels package.

.. GENERATED FROM PYTHON SOURCE LINES 43-50

.. code-block:: default


    stats_cca = CanCorr(Z, X)

    print(stats_cca.corr_test().summary())
    print_weights('X', stats_cca.x_cancoef)
    print_weights('Z', stats_cca.y_cancoef)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

                               Cancorr results
    =====================================================================
      Canonical Correlation Wilks' lambda Num DF  Den DF  F Value  Pr > F
    ---------------------------------------------------------------------
    0                0.7876        0.3768 4.0000 658.0000 103.4837 0.0000
    1                0.0864        0.9925 1.0000 330.0000   2.4812 0.1162
    ---------------------------------------------------------------------
                                                                     
    ---------------------------------------------------------------------
    Multivariate Statistics and F Approximations                         
    -----------------------------------------------------------------------
                             Value    Num DF    Den DF    F Value    Pr > F
    -----------------------------------------------------------------------
    Wilks' lambda            0.3768   4.0000   658.0000   103.4837   0.0000
    Pillai's trace           0.6278   4.0000   660.0000    75.4943   0.0000
    Hotelling-Lawley trace   1.6416   4.0000   393.7623   134.8873   0.0000
    Roy's greatest root      1.6341   2.0000   330.0000   269.6262   0.0000
    =====================================================================

    X: -1.000, 0.826
    Z: -1.000, 0.027


.. GENERATED FROM PYTHON SOURCE LINES 51-52

Next, use CCA algorithm from Witten et al.

.. GENERATED FROM PYTHON SOURCE LINES 52-63

.. code-block:: default


    U, V, D = cca_pmd(X, Z, penaltyx=1.0, penaltyz=1.0, K=2, standardize=False)

    x_weights = U[:, 0]
    z_weights = V[:, 0]
    corrcoef = np.corrcoef(np.dot(x_weights, X.T), np.dot(z_weights, Z.T))[0, 1]
    print("Corrcoef for comp 1: " + str(corrcoef))

    print_weights('X', U)
    print_weights('Z', V)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Corrcoef for comp 1: 0.7629782437780296
    X: -1.000, 0.848
    Z: -1.000, -0.866


.. GENERATED FROM PYTHON SOURCE LINES 64-66

As the CCA algorithm in Witten et al is faster version of 
computing SVD of X.T @ Z, try that.

.. GENERATED FROM PYTHON SOURCE LINES 66-77

.. code-block:: default


    U, D, V = svd(X.T @ Z)

    x_weights = U[:, 0]
    z_weights = V[0, :]
    corrcoef = np.corrcoef(np.dot(x_weights, X.T), np.dot(z_weights, Z.T))[0, 1]
    print("Corrcoef for comp 1: " + str(corrcoef))

    print_weights('X', U)
    print_weights('V', V.T)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Corrcoef for comp 1: 0.7629782437780295
    X: -1.000, 0.848
    V: -1.000, -0.866


.. GENERATED FROM PYTHON SOURCE LINES 78-81

The novelty in Witten et al is developing matrix decomposition similar
to SVD, but which allows to add convex penalties (here lasso).
Using that to X.T @ Z without penalty results to same as above.

.. GENERATED FROM PYTHON SOURCE LINES 81-92

.. code-block:: default


    U, V, D = pmd(X.T @ Z, K=2, penaltyu=1.0, penaltyv=1.0, standardize=False)

    x_weights = U[:, 0]
    z_weights = V[:, 0]
    corrcoef = np.corrcoef(np.dot(x_weights, X.T), np.dot(z_weights, Z.T))[0, 1]
    print("Corrcoef for comp 1: " + str(corrcoef))

    print_weights('X', U)
    print_weights('Z', V)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Corrcoef for comp 1: 0.7629782437780296
    X: -1.000, 0.848
    Z: -1.000, -0.866


.. GENERATED FROM PYTHON SOURCE LINES 93-94

However, when you add penalties, you get a sparse version of CCA.

.. GENERATED FROM PYTHON SOURCE LINES 94-105

.. code-block:: default


    U, V, D = pmd(X.T @ Z, K=2, penaltyu=0.8, penaltyv=0.9, standardize=False)

    x_weights = U[:, 0]
    z_weights = V[:, 0]
    corrcoef = np.corrcoef(np.dot(x_weights, X.T), np.dot(z_weights, Z.T))[0, 1]
    print("Corrcoef for comp 1: " + str(corrcoef))

    print_weights('X', U)
    print_weights('Z', V)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Corrcoef for comp 1: 0.7038297679392981
    X: -1.000, 0.143
    Z: -1.000, -0.347


.. GENERATED FROM PYTHON SOURCE LINES 106-114

PMD is really fantastically simple and powerful idea, and as seen, 
can be used to implement sparse CCA. However, for SVD(X.T @ Z) to be 
equivalent to CCA, cov(X) and cov(Z) should be diagonal,
which can sometimes give problems. Another CCA algorithm allowing convex penalties
that does not require cov(X) and cov(Z) to be diagonal, was presented in 
Mai et al (2019). It is based on iterative least squares formulation, and as it is
solved with GLM, it allows elastic net -like weighting of L1 and L2 -norms for 
both datasets separately.

.. GENERATED FROM PYTHON SOURCE LINES 114-126

.. code-block:: default


    X_weights, Z_weights = cca_ipls(X, Z, alpha_lambda=0.0, beta_lambda=0.0, standardize=False,
                                    n_pairs=2, glm_impl='glmnet_python')

    x_weights = X_weights[:, 0]
    z_weights = Z_weights[:, 0]
    corrcoef = np.corrcoef(np.dot(x_weights, X.T), np.dot(z_weights, Z.T))[0, 1]
    print("Corrcoef for comp 1: " + str(corrcoef))

    print_weights("X", X_weights)
    print_weights("Z", Z_weights)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Corrcoef for comp 1: 0.7876314577102576
    X: -1.000, 0.825
    Z: -1.000, 0.026


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.882 seconds)


.. _sphx_glr_download_auto_examples_plot_cca.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_cca.py <plot_cca.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_cca.ipynb <plot_cca.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_