Merge pull request #10 from MatthewSZhang/docs

DOC add examples
scikit-learn-contrib · Oct 14, 2024 · bf573bc · bf573bc
2 parents 10a32a9 + 36a9c99
commit bf573bc
Show file tree

Hide file tree

Showing 22 changed files with 2,640 additions and 936 deletions.
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -11,4 +11,4 @@ python:
   install:
     - method: pip
       path: .
-      extra_requirements: [doc]
+      extra_requirements: [docs]
diff --git a/README.rst b/README.rst
@@ -1,5 +1,5 @@
-FastCan: A Fast Canonical-Correlation-Based Feature Selection Method
-====================================================================
+FastCan: A Fast Canonical-Correlation-Based Feature Selection Algorithm
+=======================================================================
 |conda| |Codecov| |CI| |Doc| |PythonVersion| |PyPi| |Black| |ruff| |pixi|
 
 .. |conda| image:: https://img.shields.io/conda/vn/conda-forge/fastcan.svg
@@ -29,6 +29,18 @@ FastCan: A Fast Canonical-Correlation-Based Feature Selection Method
 .. |pixi| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/prefix-dev/pixi/main/assets/badge/v0.json&style=flat-square
    :target: https://pixi.sh
 
+FastCan is a feature selection method, which has following advantages:
+
+#. Extremely **fast**. See :ref:`sphx_glr_auto_examples_plot_speed.py`.
+
+#. Support unsupervised feature selection. See :ref:`Unsupervised feature selection <unsupervised>`.
+
+#. Support multioutput feature selection. See :ref:`Multioutput feature selection <multioutput>`.
+
+#. Skip redundant features. See :ref:`Feature redundancy <redundancy>`.
+
+#. Evalaute relative usefulness of features. See :ref:`sphx_glr_auto_examples_plot_intuitive.py`.
+
 
 Installation
 ------------
@@ -41,25 +53,22 @@ Or via conda-forge:
 
 * Run ``conda install -c conda-forge fastcan``
 
-Examples
---------
+Getting Started
+---------------
 >>> from fastcan import FastCan
->>> X = [[ 0.87, -1.34,  0.31 ],
-...     [-2.79, -0.02, -0.85 ],
-...     [-1.34, -0.48, -2.55 ],
-...     [ 1.92,  1.48,  0.65 ]]
->>> y = [0, 1, 0, 1]
->>> selector = FastCan(n_features_to_select=2, verbose=0).fit(X, y)
->>> selector.get_support()
-array([ True,  True, False])
+>>> X = [[1, 0], [0, 1]]
+>>> y = [1, 0]
+>>> FastCan(verbose=0).fit(X, y).get_support()
+array([ True, False])
 
+Check :ref:`User Guild <user_guide>` and :ref:`Examples <examples>` for more information.
 
 Citation
 --------
 
 FastCan is a Python implementation of the following papers.
 
-If you use the `h-correlation` algorithm in your work please cite the following reference:
+If you use the `h-correlation` method in your work please cite the following reference:
 
 .. code:: bibtex
 
@@ -76,7 +85,7 @@ If you use the `h-correlation` algorithm in your work please cite the following
       keywords = {Feature selection, Orthogonal least squares, Canonical correlation analysis, Linear discriminant analysis, Multi-label, Multivariate time series, Feature interaction},
       }
 
-If you use the `eta-cosine` algorithm in your work please cite the following reference:
+If you use the `eta-cosine` method in your work please cite the following reference:
 
 .. code:: bibtex
 

diff --git a/doc/conf.py b/doc/conf.py
@@ -40,6 +40,8 @@
     "sphinx.ext.napoleon",
     # Link to other project's documentation (see mapping below)
     "sphinx.ext.intersphinx",
+    "sphinx_gallery.gen_gallery",
+    "sphinx_design",
 ]
 
 # List of patterns, relative to source directory, that match files and
@@ -67,14 +69,7 @@
     "sklearn": ("https://scikit-learn.org/stable", None),
 }
 
-# add substitutions that should be available in every file
-rst_prolog = """
-.. |numpy_dtype| replace:: numpy data type
-.. _numpy_dtype: https://numpy.org/doc/stable/user/basics.types.html
-
-.. |sklearn_cython_dtype| replace:: sklearn cython data type
-.. _sklearn_cython_dtype: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_typedefs.pxd
-
-.. |sphinx_link| replace:: rst Markup Spec
-.. _sphinx_link: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html
-"""
+sphinx_gallery_conf = {
+    "examples_dirs": ["../examples"],
+    "gallery_dirs": ["auto_examples"],
+}
diff --git a/doc/index.rst b/doc/index.rst
@@ -13,15 +13,24 @@
 
 
 API Reference
-~~~~~~~~~~~~~
+-------------
 .. autosummary::
    :toctree: generated/
 
    FastCan
    ssc
+   ols
 
+Useful Links
+------------
+.. toctree::
+   :maxdepth: 2
 
-...................
+   User Guild <user_guide>
+   Examples <auto_examples/index>
+
+API Compatibility
+-----------------
 
 The API of this package is align with scikit-learn.
 

diff --git a/doc/multioutput.rst b/doc/multioutput.rst
@@ -0,0 +1,57 @@
+.. currentmodule:: fastcan
+
+.. _multioutput:
+
+==============================
+Multioutput feature selection
+==============================
+
+We can use :class:`FastCan` to handle multioutput feature selection, which means
+target ``y`` can be a matrix. For regression, :class:`FastCan` can be used for
+MIMO (Multi-Input Multi-Output) data. For classification, it can be used for
+multilabel data. Actually, for multiclass classification, which has one output with
+multiple categories, multioutput feature selection can also be useful. The multiclass
+classification can be converted to multilabel classification by one-hot encoding
+target ``y``. The cannonical correaltion coefficient between the features ``X`` and the
+one-hot encoded target ``y`` has equivalent relationship with Fisher's criterion in
+LDA (Linear Discriminant Analysis) [1]_. Applying :class:`FastCan` to the converted
+multioutput data may result in better accuracy in the following classification task
+than applying it directly to the original single-label data. See Figure 5 in [2]_.
+
+Relationship on multiclass data
+-------------------------------
+Assume the feature matrix is :math:`X \in \mathbb{R}^{N\times n}`, the multiclass
+target vector is :math:`y \in \mathbb{R}^{N\times 1}`, and the one-hot encoded target
+matrix is :math:`Y \in \mathbb{R}^{N\times m}`. Then, the Fisher's criterion for
+:math:`X` and :math:`y` is denoted as :math:`J` and the canonical correaltion
+coefficient between :math:`X` and :math:`Y` is denoted as :math:`R`. The relationship
+between :math:`J` and :math:`R` is given by
+
+.. math::
+    J = \frac{R^2}{1-R^2}
+
+or
+
+.. math::
+    R^2 = \frac{J}{1+J}
+
+It should be noted that the number of the Fisher's criterion and the canonical
+correaltion coefficient is not only one. The number of the non-zero canonical
+correlation coefficients is no more than :math:`\min (n, m)`, and each canonical correlation
+coefficient is one-to-one correspondence to each Fisher's criterion.
+
+.. rubric:: References
+
+.. [1] `"Orthogonal least squares based fast feature selection for
+  linear classification" <https://doi.org/10.1016/j.patcog.2021.108419>`_
+  Zhang, S., & Lang, Z. Q. Pattern Recognition, 123, 108419 (2022).
+
+.. [2] `"Canonical-correlation-based fast feature selection for structural
+  health monitoring" <https://doi.org/10.1016/j.ymssp.2024.111895>`_
+  Zhang, S., Wang, T., Worden, K., Sun L., & Cross, E. J.
+  Mechanical Systems and Signal Processing, 223, 111895 (2025).
+
+.. rubric:: Examples
+
+* See :ref:`sphx_glr_auto_examples_plot_fisher.py` for an example of
+  the equivalent relationship between CCA and LDA on multiclass data.
diff --git a/doc/ols_and_omp.rst b/doc/ols_and_omp.rst
@@ -0,0 +1,58 @@
+.. currentmodule:: fastcan
+
+.. _ols_omp:
+
+===========================
+Comparison with OLS and OMP
+===========================
+
+:class:`FastCan` has a close relationship with Orthogonal Least Squares (OLS) [1]_
+and Orthogonal Matching Pursuit (OMP) [2]_.
+The detailed difference between OLS and OMP can be found in [3]_.
+Here, let's briefly compare the three methods.
+
+
+Assume we have a feature matrix :math:`X_s \in \mathbb{R}^{N\times t}`, which constains
+:math:`t` selected features, and a target vector :math:`y \in \mathbb{R}^{N\times 1}`.
+Then the residual :math:`r \in \mathbb{R}^{N\times 1}` of the least-squares can be
+found by
+
+.. math::
+    r = y - X_s \beta \;\; \text{where} \;\; \beta =  (X_s^\top X_s)^{-1}X_s^\top y
+
+When evaluating a new candidate feature :math:`x_i \in \mathbb{R}^{N\times 1}`
+
+* for OMP, the feature which maximizes :math:`r^\top x_i` will be selected,
+* for OLS, the feature which maximizes :math:`r^\top w_i` will be selected, where
+  :math:`w_i \in \mathbb{R}^{N\times 1}` is the projection of :math:`x_i` on the
+  orthogonal subspace so that it is orthogonal to :math:`X_s`, i.e.,
+  :math:`X_s^\top w_i = \mathbf{0} \in \mathbb{R}^{t\times 1}`,
+* for :class:`FastCan` (h-correlation algorithm), it is almost same as OLS, but the
+  difference is that in :class:`FastCan`, :math:`X_s`, :math:`y`, and :math:`x_i`
+  are centered (i.e., zero mean in each column) before the selection.
+
+The small difference makes the feature ranking criterion of :class:`FastCan` is
+equivalent to the sum of squared canonical correlation coefficients, which gives
+it the following advantages over OLS and OMP:
+
+* Affine invariance: if features are polluted by affine transformation, i.e., scaled
+  and/or added some constants, the selection result given by :class:`FastCan` will be
+  unchanged. See :ref:`sphx_glr_auto_examples_plot_affinity.py`.
+* Multioutput: as :class:`FastCan` use canonical correlation for feature ranking, it is
+  naturally support feature seleciton on dataset with multioutput.
+
+
+.. rubric:: References
+
+.. [1] `"Orthogonal least squares methods and their application to non-linear
+    system identification" <https://doi.org/10.1080/00207178908953472>`_ Chen, S.,
+    Billings, S. A., & Luo, W. International Journal of control, 50(5),
+    1873-1896 (1989).
+
+.. [2] `"Matching pursuits with time-frequency dictionaries"
+    <https://doi.org/10.1109/78.258082>`_ Mallat, S. G., & Zhang, Z.
+    IEEE Transactions on signal processing, 41(12), 3397-3415 (1993).
+
+.. [3] `"On the difference between Orthogonal Matching Pursuit and Orthogonal Least
+    Squares" <https://eprints.soton.ac.uk/142469/1/BDOMPvsOLS07.pdf>`_ Blumensath, T.,
+    & Davies, M. E. Technical report, University of Edinburgh, (2007).
diff --git a/doc/redundancy.rst b/doc/redundancy.rst
@@ -0,0 +1,35 @@
+.. currentmodule:: fastcan
+
+.. _redundancy:
+
+==================
+Feature redundancy
+==================
+
+:class:`FastCan` can effectively skip the linearly redundant features.
+Here a feature :math:`x_r\in \mathbb{R}^{N\times 1}` is linearly
+redundant to a set of features :math:`X\in \mathbb{R}^{N\times n}` means that
+:math:`x_r` can be obtained from an affine transformation of :math:`X`, given by
+
+.. math::
+    x_r = Xa + b
+
+where :math:`a\in \mathbb{R}^{n\times 1}` and :math:`b\in \mathbb{R}^{N\times 1}`.
+In other words, the feature can be acquired by a linear transformation of :math:`X`,
+i.e. :math:`Xa`, and a translation, i.e. :math:`+b`.
+
+This capability of :class:`FastCan` is benefited from the
+`Modified Gram-Schmidt <https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process>`_,
+which gives large rounding-errors when linearly redundant features appears.
+
+.. rubric:: References
+
+* `"Canonical-correlation-based fast feature selection for structural
+  health monitoring" <https://doi.org/10.1016/j.ymssp.2024.111895>`_
+  Zhang, S., Wang, T., Worden, K., Sun L., & Cross, E. J.
+  Mechanical Systems and Signal Processing, 223, 111895 (2025).
+
+.. rubric:: Examples
+
+* See :ref:`sphx_glr_auto_examples_plot_redundancy.py` for an example of
+  feature selection on datasets with redundant features.
diff --git a/doc/unsupervised.rst b/doc/unsupervised.rst
@@ -0,0 +1,38 @@
+.. currentmodule:: fastcan
+
+.. _unsupervised:
+
+==============================
+Unsupervised feature selection
+==============================
+
+We can use :class:`FastCan` to do unsupervised feature selection.
+The unsupervised application of :class:`FastCan` tries to select features, which
+maximize the sum of the squared canonical correlation (SSC) with the principal
+components (PCs) acquired from PCA (principal component analysis) of the feature
+matrix :math:`X`. See the example below.
+
+    >>> from sklearn.decomposition import PCA
+    >>> from sklearn import datasets
+    >>> from fastcan import FastCan
+    >>> iris = datasets.load_iris()
+    >>> X = iris["data"]
+    >>> pca = PCA(n_components=2)
+    >>> X_pcs = pca.fit_transform(X)
+    >>> selector = FastCan(n_features_to_select=2, verbose=0).fit(X, X_pcs[:, :2])
+    >>> selector.indices_
+    array([2, 1], dtype=int32)
+
+.. note::
+    There is no guarantee that this unsupervised :class:`FastCan` will select
+    the optimal subset of the features, which has the highest SSC with PCs.
+    Because :class:`FastCan` selects features in a greedy manner, which may lead to
+    suboptimal results.
+
+.. rubric:: References
+
+* `"Automatic Selection of Optimal Structures for Population-Based
+  Structural Health Monitoring" <https://doi.org/10.1007/978-3-031-34946-1_10>`_
+  Wang, T., Worden, K., Wagg, D.J., Cross, E.J., Maguire, A.E., Lin, W.
+  In: Conference Proceedings of the Society for Experimental Mechanics Series.
+  Springer, Cham. (2023).
diff --git a/doc/user_guide.rst b/doc/user_guide.rst
@@ -0,0 +1,14 @@
+.. _user_guide:
+
+==========
+User Guide
+==========
+
+.. toctree::
+   :numbered:
+   :maxdepth: 1
+
+   unsupervised.rst
+   multioutput.rst
+   redundancy.rst
+   ols_and_omp.rst
diff --git a/examples/README.rst b/examples/README.rst
@@ -0,0 +1,6 @@
+.. _examples:
+
+Examples
+========
+
+Below is a gallery of examples.