-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the EnsemblePursuit wiki!
Thank you so much for checking out this wiki where I talk about the Ensemble Pursuit algorithm and its implementation in code.
Ensemble Pursuit is an algorithm for dimensionality reduction of neural data developed primarily for deconvolved calcium imaging time series data. It is based on matrix factorization. It finds sparse groupings of correlated cells (termed Ensembles) from high-dimensional neural datasets (we tested the algorithm on publicly available data with over 10,000 neurons).
Matrix factorization algorithms are used to decompose a matrix into a product of two matrices, U and V with dimensions for matrix multiplication matching in components. The number of components is much smaller than the dimensionality of the data and because of that matrix factorization reduces the dimensionality of the problem. Here we call components ensembles as they represent groups of co-activating cells. In Ensemble Pursuit, each column of U (U is neurons x components for data of shape neurons x timepoints) represents which neurons belong to the ensemble and with what strength they contribute to the time evolution of an ensemble and each row demarcates which ensembles a neuron belongs to and its intensity. V stores the time evolution of an ensemble. It's a weighted average time course of the neurons that belong to that ensemble.
To find these two matrices a cost function is optimized. It has two terms. The first terms measures the discrepancy between X and it's approximation by U@V via the L2 norm. The second term is a regularization term that encourages the sparsity of U components via a L0 penalty that puts a penalty
\begin{equation}
\text{Cost} = || X - U\cdot V||^2 + \lambda ||U||_0
\end{equation}
To fit an ensemble for
\begin{equation} \Delta_j = \frac{\max(0, x_{j}v )^2}{||v||^2} - \lambda \label{delta_cost} \end{equation}
After adding the neuron to the ensemble we update v as an average time course of the neurons in the ensemble so far and keep adding new neurons until the regularization term begins to dominate and adding new neurons to the ensemble no longer decreases the cost. At that point we compute the intensities of
\begin{equation} u_j = \frac{\max(0, v_{j}^{T}x)}{||v||^2} \end{equation}
After fitting one ensemble the next ensembles are fit from the residual of subtracting the approximation u@v from X.
\begin{equation} X_{res}=X-u\cdot v \end{equation}
Below we go into the mechanics of the code used to implement the algorithm in Numpy and PyTorch (for GPU's). There are some computational tricks that we use to speed up the algorithm that will be explained below.
Optimizing computing X@X.T
We need to compute
\begin{equation} C_{0}=XX^T \end{equation}
\begin{equation} X_{1}=X_{0}-uv^T \end{equation}
\begin{equation} C_{1}=(X_{0}-uv^T)(X_{0}-uv^T)^T=C_{0}+uv^T(uv^T)^T-uv^TX_{0}-X_{0}(uv^T)^T=C_{0}+u(v^Tv)u^T-uv^TX_{0}^T-X_{0}(uv^T)^T \end{equation}
Computing