Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute and plot feature loadings in ordination methods #596

Closed
thpral opened this issue Jul 2, 2024 · 7 comments
Closed

Compute and plot feature loadings in ordination methods #596

thpral opened this issue Jul 2, 2024 · 7 comments
Assignees

Comments

@thpral
Copy link
Collaborator

thpral commented Jul 2, 2024

With @ElySeraidarian, we have been working on computing and plotting feature loadings for several ordination methods. Currently, we can compute and plot the feature loadings for 3 ordination methods: PCA, LDA and NMF. However, we are uncertain about the optimal way to implement this feature in mia (and possibly miaViz for the visualization part).

First, we are uncertain about the optimal way to store feature loadings after calculation. The calculation of feature loadings is different for all methods but they all result in a matrix (features in lines and latent vectors in columns).
Ideally, we would like to store the feature loadings at the same place for all ordination methods (e.g. attr or metadata) but it might be difficult. The ordination functions we use to compute feature loadings are not storing them at the same place at all.
For instance, when running runPCA, the feature loadings matrix is directly stored in attribute "rotation".

Also, we are wondering if it would be best to create a generic wrapper for plotting. @antagomir suggested that we build specific functions for each method and then a generic wrapper that would recognize the type of arguments and call the relevant method.

I think it is important to mention that to visualize the rowTree as well as the feature loadings, we need to restrict the number of features so that the ordination plot remains readable. For that, we use agglomerateByPrevalence (which also agglomerate the rowTree thanks to #575)

@thpral
Copy link
Collaborator Author

thpral commented Jul 3, 2024

@TuomasBorman Do you have any suggestions or preferences regarding this?

@TuomasBorman
Copy link
Contributor

Can you list all the methods that you are implementing? I believe NMF and LDA are not currently available from TreeSE/SingleCellExperiment ecosystem?

PCA --> scater::runPCA()
RDA --> mia::runRDA()/getRDA()

I think first task is to implement LDA and NMF (if they are not yet available). There should be both get* and add* methods available.

get* should output ordination including feature loadings in attribute, that might be the best solution. add* should add the ordination result to reducedDim slot.

Wrapper for feature loadings was already discussed briefly here microbiome/miaViz#92. Certainly there is a need for wrapper. That could take either TreeSE or feature loadings matrix as input.

Leo's idea of specific functions and generic wrapper seems reasonable. However, you should keep in mind the maintenance issues; you should avoid "copy-pasting" the code. Here it seems that loadings are in same format (matrix), and only thing that differs is from where are they fetched from TreeSE object. If that is the case, the data fetching part is the only thing that requires ordination-specific method. The rest of the functionality can be shared by all methods. (Not sure though since I have not checked this thing in detail).

I am not sure about the use-case of plotting loadings along with rowTree().

In any case, you should first create a draft file that just has the functionality without formal function-specific stuff. For plotting wrapper, you could use OMA example as a template and make it general. This seems a little bit bigger task, so the ground work is very important; especially if you don't have time to finish this.

I might be slow to respond on July because I am working only part-time.

@antagomir
Copy link
Member

It could also help to proceed if visualization methods can be already provided for at least one of the readily available ordination functions. All ordination methods are supposed to provide similar output (features x components) and the visualization method can assume that the output is available in a suitable format.

Tree: at the first stage it could be even sufficient to just assume that the user has subsetted or agglomerated the tree themselves before the analysis, in whatever way they wish. The examples could be done with sufficiently small data sets with not too many features. I am not sure if the plotting method itself is the best place to deal with this. At least not on the first round. In the subsequent steps we can see how to scale up.

I support comments from @TuomasBorman . I also suggest that here in mia issues we discuss details of the ordination method, and in the indicated miaViz issue we discuss the visualization details.

@antagomir
Copy link
Member

I agree about creating the get functions for NMF and LDA (similar to getDMM and others that we have already). Or at least one of these to begin with.

All get functions should then return the ordination loadings in the same way. Yes, this can be an issue when we use external functions such as scater::runPCA. We cannot solve that unless we a) implement our on mia::getPCA wrapper on top of it ; b) suggest PR to scater to change this (no idea how it would be received there); c) create a general conversion function getLoadings() that can pick the loadings from different packages & outputs. I might be in favor of option (a).

@thpral
Copy link
Collaborator Author

thpral commented Jul 4, 2024

Can you list all the methods that you are implementing? I believe NMF and LDA are not currently available from TreeSE/SingleCellExperiment ecosystem?

PCA --> scater::runPCA() RDA --> mia::runRDA()/getRDA()

I think first task is to implement LDA and NMF (if they are not yet available). There should be both get* and add* methods available.

get* should output ordination including feature loadings in attribute, that might be the best solution. add* should add the ordination result to reducedDim slot.

Wrapper for feature loadings was already discussed briefly here microbiome/miaViz#92. Certainly there is a need for wrapper. That could take either TreeSE or feature loadings matrix as input.

Leo's idea of specific functions and generic wrapper seems reasonable. However, you should keep in mind the maintenance issues; you should avoid "copy-pasting" the code. Here it seems that loadings are in same format (matrix), and only thing that differs is from where are they fetched from TreeSE object. If that is the case, the data fetching part is the only thing that requires ordination-specific method. The rest of the functionality can be shared by all methods. (Not sure though since I have not checked this thing in detail).

I am not sure about the use-case of plotting loadings along with rowTree().

In any case, you should first create a draft file that just has the functionality without formal function-specific stuff. For plotting wrapper, you could use OMA example as a template and make it general. This seems a little bit bigger task, so the ground work is very important; especially if you don't have time to finish this.

I might be slow to respond on July because I am working only part-time.

Here are the examples I have for LDA and NMF ordinations, I use topicmodels::LDA and NMF::nmf:

# NMF example
library(NMF)
library(mia)

data(GlobalPatterns)
tse <- GlobalPatterns

tse <- transformAssay(tse, method = "relabundance")

altExp(tse, "GenusPrevalent") <- agglomerateByPrevalence(tse, rank="Genus", assay.type="relabundance",
                                                         detection = 3/100, prevalence=5/100)

altExp(tse, "GenusPrevalent") <- altExp(tse, "GenusPrevalent")[!(rownames(altExp(tse, "GenusPrevalent")) 
                                                                 %in% c("Other")),]

x <- t(assay(altExp(tse, "GenusPrevalent"), "counts"))
nmf2 <- nmf(x, 2)

H <- nmf2@fit@H

feature_loadings <- t(H)
head(feature_loadings)
# LDA example
library(mia)
library(topicmodels)

data(GlobalPatterns)
tse <- GlobalPatterns

tse <- transformAssay(tse, method = "relabundance")

altExp(tse, "GenusPrevalent") <- agglomerateByPrevalence(tse, rank="Genus", assay.type="relabundance",
                                                         detection=3/100, prevalence=5/100)

altExp(tse, "GenusPrevalent") <- altExp(tse, "GenusPrevalent")[!(rownames(altExp(tse,
                                                                                 "GenusPrevalent")) %in% c("Other")),]

lda_model <- LDA(t(assay(altExp(tse, "GenusPrevalent"), "counts")), k = 2)

df <- as.data.frame(t(assay(altExp(tse, "GenusPrevalent"), "counts")))

posteriors <- topicmodels::posterior(lda_model, df)

feature_loadings <- t(as.data.frame(posteriors$terms))
head(feature_loadings)

I can open a PR to implement get* and add* methods for LDA and NMF, and I agree that a wrapper for PCA would be great to store feature loadings exactly the same way for all 3 ordination methods.

@antagomir
Copy link
Member

I also suggest that you implement wrapper only for either LDA or NMF first. Then we can make sure that the recommended practices are being followed, afterwards do the other method following the same procedure.

@antagomir
Copy link
Member

@ElySeraidarian can you close as this becomes solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants