Skip to content

Commit

Permalink
Merge pull request #10 from MatthewSZhang/docs
Browse files Browse the repository at this point in the history
DOC add examples
  • Loading branch information
MatthewSZhang authored Oct 14, 2024
2 parents 10a32a9 + 36a9c99 commit bf573bc
Show file tree
Hide file tree
Showing 22 changed files with 2,640 additions and 936 deletions.
2 changes: 1 addition & 1 deletion .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ python:
install:
- method: pip
path: .
extra_requirements: [doc]
extra_requirements: [docs]
37 changes: 23 additions & 14 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
FastCan: A Fast Canonical-Correlation-Based Feature Selection Method
====================================================================
FastCan: A Fast Canonical-Correlation-Based Feature Selection Algorithm
=======================================================================
|conda| |Codecov| |CI| |Doc| |PythonVersion| |PyPi| |Black| |ruff| |pixi|

.. |conda| image:: https://img.shields.io/conda/vn/conda-forge/fastcan.svg
Expand Down Expand Up @@ -29,6 +29,18 @@ FastCan: A Fast Canonical-Correlation-Based Feature Selection Method
.. |pixi| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/prefix-dev/pixi/main/assets/badge/v0.json&style=flat-square
:target: https://pixi.sh

FastCan is a feature selection method, which has following advantages:

#. Extremely **fast**. See :ref:`sphx_glr_auto_examples_plot_speed.py`.

#. Support unsupervised feature selection. See :ref:`Unsupervised feature selection <unsupervised>`.

#. Support multioutput feature selection. See :ref:`Multioutput feature selection <multioutput>`.

#. Skip redundant features. See :ref:`Feature redundancy <redundancy>`.

#. Evalaute relative usefulness of features. See :ref:`sphx_glr_auto_examples_plot_intuitive.py`.


Installation
------------
Expand All @@ -41,25 +53,22 @@ Or via conda-forge:

* Run ``conda install -c conda-forge fastcan``

Examples
--------
Getting Started
---------------
>>> from fastcan import FastCan
>>> X = [[ 0.87, -1.34, 0.31 ],
... [-2.79, -0.02, -0.85 ],
... [-1.34, -0.48, -2.55 ],
... [ 1.92, 1.48, 0.65 ]]
>>> y = [0, 1, 0, 1]
>>> selector = FastCan(n_features_to_select=2, verbose=0).fit(X, y)
>>> selector.get_support()
array([ True, True, False])
>>> X = [[1, 0], [0, 1]]
>>> y = [1, 0]
>>> FastCan(verbose=0).fit(X, y).get_support()
array([ True, False])

Check :ref:`User Guild <user_guide>` and :ref:`Examples <examples>` for more information.

Citation
--------

FastCan is a Python implementation of the following papers.

If you use the `h-correlation` algorithm in your work please cite the following reference:
If you use the `h-correlation` method in your work please cite the following reference:

.. code:: bibtex
Expand All @@ -76,7 +85,7 @@ If you use the `h-correlation` algorithm in your work please cite the following
keywords = {Feature selection, Orthogonal least squares, Canonical correlation analysis, Linear discriminant analysis, Multi-label, Multivariate time series, Feature interaction},
}
If you use the `eta-cosine` algorithm in your work please cite the following reference:
If you use the `eta-cosine` method in your work please cite the following reference:

.. code:: bibtex
Expand Down
17 changes: 6 additions & 11 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
"sphinx.ext.napoleon",
# Link to other project's documentation (see mapping below)
"sphinx.ext.intersphinx",
"sphinx_gallery.gen_gallery",
"sphinx_design",
]

# List of patterns, relative to source directory, that match files and
Expand Down Expand Up @@ -67,14 +69,7 @@
"sklearn": ("https://scikit-learn.org/stable", None),
}

# add substitutions that should be available in every file
rst_prolog = """
.. |numpy_dtype| replace:: numpy data type
.. _numpy_dtype: https://numpy.org/doc/stable/user/basics.types.html
.. |sklearn_cython_dtype| replace:: sklearn cython data type
.. _sklearn_cython_dtype: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_typedefs.pxd
.. |sphinx_link| replace:: rst Markup Spec
.. _sphinx_link: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html
"""
sphinx_gallery_conf = {
"examples_dirs": ["../examples"],
"gallery_dirs": ["auto_examples"],
}
13 changes: 11 additions & 2 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,24 @@


API Reference
~~~~~~~~~~~~~
-------------
.. autosummary::
:toctree: generated/

FastCan
ssc
ols

Useful Links
------------
.. toctree::
:maxdepth: 2

...................
User Guild <user_guide>
Examples <auto_examples/index>

API Compatibility
-----------------

The API of this package is align with scikit-learn.

Expand Down
57 changes: 57 additions & 0 deletions doc/multioutput.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
.. currentmodule:: fastcan

.. _multioutput:

==============================
Multioutput feature selection
==============================

We can use :class:`FastCan` to handle multioutput feature selection, which means
target ``y`` can be a matrix. For regression, :class:`FastCan` can be used for
MIMO (Multi-Input Multi-Output) data. For classification, it can be used for
multilabel data. Actually, for multiclass classification, which has one output with
multiple categories, multioutput feature selection can also be useful. The multiclass
classification can be converted to multilabel classification by one-hot encoding
target ``y``. The cannonical correaltion coefficient between the features ``X`` and the
one-hot encoded target ``y`` has equivalent relationship with Fisher's criterion in
LDA (Linear Discriminant Analysis) [1]_. Applying :class:`FastCan` to the converted
multioutput data may result in better accuracy in the following classification task
than applying it directly to the original single-label data. See Figure 5 in [2]_.

Relationship on multiclass data
-------------------------------
Assume the feature matrix is :math:`X \in \mathbb{R}^{N\times n}`, the multiclass
target vector is :math:`y \in \mathbb{R}^{N\times 1}`, and the one-hot encoded target
matrix is :math:`Y \in \mathbb{R}^{N\times m}`. Then, the Fisher's criterion for
:math:`X` and :math:`y` is denoted as :math:`J` and the canonical correaltion
coefficient between :math:`X` and :math:`Y` is denoted as :math:`R`. The relationship
between :math:`J` and :math:`R` is given by

.. math::
J = \frac{R^2}{1-R^2}
or

.. math::
R^2 = \frac{J}{1+J}
It should be noted that the number of the Fisher's criterion and the canonical
correaltion coefficient is not only one. The number of the non-zero canonical
correlation coefficients is no more than :math:`\min (n, m)`, and each canonical correlation
coefficient is one-to-one correspondence to each Fisher's criterion.

.. rubric:: References

.. [1] `"Orthogonal least squares based fast feature selection for
linear classification" <https://doi.org/10.1016/j.patcog.2021.108419>`_
Zhang, S., & Lang, Z. Q. Pattern Recognition, 123, 108419 (2022).
.. [2] `"Canonical-correlation-based fast feature selection for structural
health monitoring" <https://doi.org/10.1016/j.ymssp.2024.111895>`_
Zhang, S., Wang, T., Worden, K., Sun L., & Cross, E. J.
Mechanical Systems and Signal Processing, 223, 111895 (2025).
.. rubric:: Examples

* See :ref:`sphx_glr_auto_examples_plot_fisher.py` for an example of
the equivalent relationship between CCA and LDA on multiclass data.
58 changes: 58 additions & 0 deletions doc/ols_and_omp.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
.. currentmodule:: fastcan

.. _ols_omp:

===========================
Comparison with OLS and OMP
===========================

:class:`FastCan` has a close relationship with Orthogonal Least Squares (OLS) [1]_
and Orthogonal Matching Pursuit (OMP) [2]_.
The detailed difference between OLS and OMP can be found in [3]_.
Here, let's briefly compare the three methods.


Assume we have a feature matrix :math:`X_s \in \mathbb{R}^{N\times t}`, which constains
:math:`t` selected features, and a target vector :math:`y \in \mathbb{R}^{N\times 1}`.
Then the residual :math:`r \in \mathbb{R}^{N\times 1}` of the least-squares can be
found by

.. math::
r = y - X_s \beta \;\; \text{where} \;\; \beta = (X_s^\top X_s)^{-1}X_s^\top y
When evaluating a new candidate feature :math:`x_i \in \mathbb{R}^{N\times 1}`

* for OMP, the feature which maximizes :math:`r^\top x_i` will be selected,
* for OLS, the feature which maximizes :math:`r^\top w_i` will be selected, where
:math:`w_i \in \mathbb{R}^{N\times 1}` is the projection of :math:`x_i` on the
orthogonal subspace so that it is orthogonal to :math:`X_s`, i.e.,
:math:`X_s^\top w_i = \mathbf{0} \in \mathbb{R}^{t\times 1}`,
* for :class:`FastCan` (h-correlation algorithm), it is almost same as OLS, but the
difference is that in :class:`FastCan`, :math:`X_s`, :math:`y`, and :math:`x_i`
are centered (i.e., zero mean in each column) before the selection.

The small difference makes the feature ranking criterion of :class:`FastCan` is
equivalent to the sum of squared canonical correlation coefficients, which gives
it the following advantages over OLS and OMP:

* Affine invariance: if features are polluted by affine transformation, i.e., scaled
and/or added some constants, the selection result given by :class:`FastCan` will be
unchanged. See :ref:`sphx_glr_auto_examples_plot_affinity.py`.
* Multioutput: as :class:`FastCan` use canonical correlation for feature ranking, it is
naturally support feature seleciton on dataset with multioutput.


.. rubric:: References

.. [1] `"Orthogonal least squares methods and their application to non-linear
system identification" <https://doi.org/10.1080/00207178908953472>`_ Chen, S.,
Billings, S. A., & Luo, W. International Journal of control, 50(5),
1873-1896 (1989).
.. [2] `"Matching pursuits with time-frequency dictionaries"
<https://doi.org/10.1109/78.258082>`_ Mallat, S. G., & Zhang, Z.
IEEE Transactions on signal processing, 41(12), 3397-3415 (1993).
.. [3] `"On the difference between Orthogonal Matching Pursuit and Orthogonal Least
Squares" <https://eprints.soton.ac.uk/142469/1/BDOMPvsOLS07.pdf>`_ Blumensath, T.,
& Davies, M. E. Technical report, University of Edinburgh, (2007).
35 changes: 35 additions & 0 deletions doc/redundancy.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. currentmodule:: fastcan

.. _redundancy:

==================
Feature redundancy
==================

:class:`FastCan` can effectively skip the linearly redundant features.
Here a feature :math:`x_r\in \mathbb{R}^{N\times 1}` is linearly
redundant to a set of features :math:`X\in \mathbb{R}^{N\times n}` means that
:math:`x_r` can be obtained from an affine transformation of :math:`X`, given by

.. math::
x_r = Xa + b
where :math:`a\in \mathbb{R}^{n\times 1}` and :math:`b\in \mathbb{R}^{N\times 1}`.
In other words, the feature can be acquired by a linear transformation of :math:`X`,
i.e. :math:`Xa`, and a translation, i.e. :math:`+b`.

This capability of :class:`FastCan` is benefited from the
`Modified Gram-Schmidt <https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process>`_,
which gives large rounding-errors when linearly redundant features appears.

.. rubric:: References

* `"Canonical-correlation-based fast feature selection for structural
health monitoring" <https://doi.org/10.1016/j.ymssp.2024.111895>`_
Zhang, S., Wang, T., Worden, K., Sun L., & Cross, E. J.
Mechanical Systems and Signal Processing, 223, 111895 (2025).

.. rubric:: Examples

* See :ref:`sphx_glr_auto_examples_plot_redundancy.py` for an example of
feature selection on datasets with redundant features.
38 changes: 38 additions & 0 deletions doc/unsupervised.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
.. currentmodule:: fastcan

.. _unsupervised:

==============================
Unsupervised feature selection
==============================

We can use :class:`FastCan` to do unsupervised feature selection.
The unsupervised application of :class:`FastCan` tries to select features, which
maximize the sum of the squared canonical correlation (SSC) with the principal
components (PCs) acquired from PCA (principal component analysis) of the feature
matrix :math:`X`. See the example below.

>>> from sklearn.decomposition import PCA
>>> from sklearn import datasets
>>> from fastcan import FastCan
>>> iris = datasets.load_iris()
>>> X = iris["data"]
>>> pca = PCA(n_components=2)
>>> X_pcs = pca.fit_transform(X)
>>> selector = FastCan(n_features_to_select=2, verbose=0).fit(X, X_pcs[:, :2])
>>> selector.indices_
array([2, 1], dtype=int32)

.. note::
There is no guarantee that this unsupervised :class:`FastCan` will select
the optimal subset of the features, which has the highest SSC with PCs.
Because :class:`FastCan` selects features in a greedy manner, which may lead to
suboptimal results.

.. rubric:: References

* `"Automatic Selection of Optimal Structures for Population-Based
Structural Health Monitoring" <https://doi.org/10.1007/978-3-031-34946-1_10>`_
Wang, T., Worden, K., Wagg, D.J., Cross, E.J., Maguire, A.E., Lin, W.
In: Conference Proceedings of the Society for Experimental Mechanics Series.
Springer, Cham. (2023).
14 changes: 14 additions & 0 deletions doc/user_guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. _user_guide:

==========
User Guide
==========

.. toctree::
:numbered:
:maxdepth: 1

unsupervised.rst
multioutput.rst
redundancy.rst
ols_and_omp.rst
6 changes: 6 additions & 0 deletions examples/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. _examples:

Examples
========

Below is a gallery of examples.
Loading

0 comments on commit bf573bc

Please sign in to comment.