plotRDA, plot loadings #92

TuomasBorman · 2023-09-07T11:52:44Z

In OMA, there is an example on how to plot loadings of features in dbRDA. We could consider to generalize this to plotting function if that is a common way to analyze which features drive the difference

TuomasBorman · 2023-09-07T11:53:05Z

#87

ElySeraidarian · 2024-07-04T09:10:12Z

I would like to provide some examples on how to plot these feature loadings after talking with @antagomir about this.

There would be 2 ways of plotting :

This requires the latest github version of mia.

With tree, this option means that the tree need to be agglomerated to the same features as the loadings matrix we get after performing the reduction method (this is now possible for LDA & NMF methods as agglomerateByPrevalence has been fixed to be possible to agglomerate tree). Here are quick examples for each of these reduction methods :

PCA

library(mia)
library(ggtree)
library(scater)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- logNormCounts(tse)
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank="Phylum", prevalence=0.99, update.tree = TRUE)
# Achieve PCA reduction
tse <- runPCA(tse, name = "PCA", ncomponents = 2)
# Get the feature loadings
loadings_matrix <- attr(reducedDim(tse, "PCA"), "rotation")
phylo <- rowTree(tse)
circ <- ggtree(phylo, layout = "circular")
df <- rowData(tse)
color <- randomcoloR::distinctColorPalette(
  length(
    unique(
      df$Phylum
    )
  )
)
df <- data.frame(Class = df$Phylum)
rownames(df) <- phylo$tip.label

df2 <- data.frame(abs(loadings_matrix))
rownames(df2) <- phylo$tip.label

# Plot the tree with first inner circle (Classes)
p <- gheatmap(
  p = circ,
  data = df,
  offset = -.1,
  width = .1,
  colnames_angle = 95,
  colnames_offset_y = .5,
  font.size = 4,
  color = "black") +
  ggplot2::scale_fill_manual(
    values = color,
    name = "Class"
  )
# Plot the feature loadings in different circles
for(i in 1:2){
  if(i == 1){
    p <- p +
      ggnewscale::new_scale_fill()
  }
  df3 <- dplyr::select(
    df2, (all_of(i))
  )

  p <- gheatmap(
    p,
    df3,
    offset = i*.065,
    width = .1,
    colnames_angle = 90,
    font.size = 4,
    high = "darkslateblue",
    low = "gray98",
    color = "black",
    legend_title = expression(beta[k]))
}

p

NMF

library(NMF)
library(mia)
library(ggtree)
data(GlobalPatterns)
tse <- GlobalPatterns
tse <- transformAssay(tse, method = "relabundance")
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank="Phylum", prevalence=0.99, update.tree = TRUE)
# Perform the NMF reduction method
x <- t(assay(tse, "counts"))
nmf2 <- nmf(x, 2)
H <- nmf2@fit@H
W <- nmf2@fit@W
# Get the feature loadings
feature_loadings <- t(H)
phylo <- rowTree(tse)
circ <- ggtree(phylo, layout = "circular")
df <- rowData(tse)
color <- randomcoloR::distinctColorPalette(
  length(
    unique(
      df$Phylum
    )
  )
)
df <- data.frame(Class = df$Phylum)
rownames(df) <- phylo$tip.label

df2 <- data.frame(feature_loadings)
rownames(df2) <- phylo$tip.label
# Plot the tree and the first inner circle (Classes)
p <- gheatmap(
  p = circ,
  data = df,
  offset = -.1,
  width = .1,
  colnames_angle = 95,
  colnames_offset_y = .5,
  font.size = 4,
  color = "black") +
  ggplot2::scale_fill_manual(
    values = color,
    name = "Class"
  )
# Plot the feature loadings in different circles
for(i in 1:2){
  if(i == 1){
    p <- p +
      ggnewscale::new_scale_fill()
  }
  df3 <- dplyr::select(
    df2, (all_of(i))
  )
  p <- gheatmap(
    p,
    df3,
    offset = i*.065,
    width = .1,
    colnames_angle = 90,
    font.size = 4,
    high = "darkslateblue",
    low = "gray98",
    color = "black",
    legend_title = expression(beta[k]))
}
p

LDA

library(mia)
library(topicmodels)
library(ggtree)

data(GlobalPatterns)
tse <- GlobalPatterns

tse <- transformAssay(tse, method = "relabundance")
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank = "Class", update.tree = TRUE, prevalence = 0.99)
# Perform LDA reduction method
lda_model <- LDA(t(assay(tse, "counts")), k = 2)

df <- as.data.frame(t(assay(tse, "counts")))

posteriors <- topicmodels::posterior(lda_model, df)
# Get the feature loadings
feature_loadings <- t(as.data.frame(posteriors$terms))


phylo <- rowTree(tse)
circ <- ggtree(phylo, layout = "circular")
df <- rowData(tse)
color <- randomcoloR::distinctColorPalette(
  length(
    unique(
      df$Phylum
    )
  )
)
df <- data.frame(Class = df$Phylum)
rownames(df) <- phylo$tip.label

df2 <- data.frame(feature_loadings)
rownames(df2) <- phylo$tip.label
# Plot the tree and first inner circles (Classes)
p <- gheatmap(
  p = circ,
  data = df,
  offset = -.1,
  width = .1,
  colnames_angle = 95,
  colnames_offset_y = .5,
  font.size = 4,
  color = "black") +
  ggplot2::scale_fill_manual(
    values = color,
    name = "Class"
  )
# Plot the feature loadings in different circles
for(i in 1:2){
  if(i == 1){
    p <- p +
      ggnewscale::new_scale_fill()
  }
  df3 <- dplyr::select(
    df2, (all_of(i))
  )
  p <- gheatmap(
    p,
    df3,
    offset = i*.065,
    width = .1,
    colnames_angle = 90,
    font.size = 4,
    high = "darkslateblue",
    low = "gray98",
    color = "black",
    legend_title = expression(beta[k]))
}
p

This is the exact same process for every reduction method, the only thing that changes is how the loadings matrix is retrieved and stored which is why wrappers discussed in microbiome/mia#596 could be helpful. As also discussed it might be possible to recognize the type of reduction method so that the user do not have to specify method when calling function.
Note that we agglomerate with a high prevalence in order to have only few features so that it can be understandable (and that the legend does not get out of the screen).

Without tree, I have made an example for PCA, there are still things that can be changed (e.g. adding coefficients inside circles?, should we consider only absolute values?)

library(mia)
library(ggtree)
library(scater)
library(ggcorrplot)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- logNormCounts(tse)
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank="Phylum", prevalence=0.99, update.tree = TRUE)
# Achieve PCA reduction
tse <- runPCA(tse, name = "PCA", ncomponents = 2)
# Get the feature loadings
loadings_matrix <- attr(reducedDim(tse, "PCA"), "rotation")

# Plot feature loadings
ggcorrplot(abs(loadings_matrix),
           type = "lower",
           method="circle",
           title="Feature loadings",
           ggtheme=theme_bw) +
  scale_fill_gradient2(breaks=c(0, 1), limit=c(0, 1))

antagomir · 2024-07-04T15:11:55Z

Thanks @ElySeraidarian

Could we formulate a miaViz-style function for one of these, so that the plotting can be done on a single line of code, given that the user provides a TreeSE object that has a) feature loading matrix (in some position that we can change later, i.e. in rowData or in metadata or in attr) and b) (optionally) a rowTree.

In the first round at least we can assume that the user has themselves done the necessary agglomerations etc. There are so many considerations on those things that we cannot automated all possible scenarios on the data sizes anyway.

TuomasBorman · 2024-07-04T19:31:03Z

Looks very nice and interesting. However, I am wondering if that is the most usual way to plot loadings ( I don't know). At least for eigenvalues the most usual way might be scree plot (bar plot). We could have function to plot both loadings and eigenvalues perhaps since both are needed to describe the ordination results fully.

library(mia)
library(ggplot2)
library(patchwork)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- transformAssay(tse, method = "clr", pseudocount = 1)
# Achieve PCA reduction
tse <- runPCA(tse, name = "PCA", assay.type = "clr")
# Get the feature loadings
df <- attr(reducedDim(tse, "PCA"), "rotation") |> as.data.frame()
# Add taxonomy labels to column
df[["feature"]] <- getTaxonomyLabels(tse)[ match(rownames(df), rownames(tse)) ]
# Get top 50 loadings
df <- df[ order(abs(df[["PC1"]]), decreasing = TRUE)[1:50], ]
# Order loadings and add factor to keep the order
df <- df[ order(df[["PC1"]]), ]
df[["feature"]] <- factor(df[["feature"]], levels = df[["feature"]])

# Loadings plot
p1 <- ggplot(df, aes(x = .data[["PC1"]], .data[["feature"]])) +
    geom_bar(stat="identity")

eig <- attr(reducedDim(tse, "PCA"), "varExplained")
eig <- eig / sum(eig)
df <- data.frame(eig_vals = eig)
# Get percentage of explained
df[["PC"]] <- paste0("PC", seq_len(nrow(df)))
df[["PC"]] <- factor(df[["PC"]], levels = df[["PC"]])
# Scree plot
p2 <- ggplot(df, aes(x = .data[["PC"]], .data[["eig_vals"]])) +
    geom_bar(stat="identity")

p1 + p2

antagomir · 2024-07-04T22:17:17Z

Only some ordination methods provide eigenvalues (e.g. PCA, MDS) and I think only in PCA (out of the commonly used ordinations) the eigenvalues have an actual clear + commonly used interpretation. Thus a general function for this might not be feasible.

It would be interesting to experiment with different ways to visualize loading matrices. The barplot/screeplot is indeed very common. If we have generic function to visualize loadings, then this could give different options, like bar/screeplot, heatmap, etc.

However, heatmap (options 2) seems to me like a good alternative. In there, one should show also the sign of the loading (e.g. with different colors for negative and positive values); also do not use two different indicators (like size and color) to denote the same thing. If you could use native ggplot2 instead of ggcorrplot that could be useful (reducing package dependencies is in general good). I would swap the figure for better readability of species names.

The tree + heatmap is very nice since usually the loadings do not show the tree but this can be useful. The idea has been borrowed from Susan Holmes' phyloseq work.

ElySeraidarian · 2024-07-05T11:29:19Z

Thanks for your returns, indeed it will be possible to create a call for a one line function that will do the plotting.
We can add different options to visualize loadings so that the user will find the best for himself.

I have not been able to plot the exact same plot using ggplot2 but here is an example I have build based on your returns

library(mia)
library(ggtree)
library(scater)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- logNormCounts(tse)
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank="Phylum", prevalence=0.99, update.tree = TRUE)
# Achieve PCA reduction
tse <- runPCA(tse, name = "PCA", ncomponents = 2)
# Get the feature loadings
loadings_matrix <- attr(reducedDim(tse, "PCA"), "rotation")
df <- attr(reducedDim(tse, "PCA"), "rotation") |> as.data.frame()
df[["Feature"]] <- getTaxonomyLabels(tse)[ match(rownames(df), rownames(tse)) ]
colours = c("red","white", "blue")


dfgroup <- data.frame(PC = c(df$PC1,df$PC2), Feature = c(df$Feature, df$Feature), group = c(rep("PC1",length(df$PC1)), rep("PC2", length(df$PC2))))

# Plot feature loadings
ggplot(dfgroup, aes(x = .data[["PC"]], y = .data[["Feature"]], group = as.factor(group), col = as.factor(group)))  +
  geom_point(size=6) +
  labs(title="Feature Loadings Plot") +
  scale_colour_manual(name = "Principal Components",
   values = c("red","blue")) +
  xlim(-1, 1)

It is a bit different as the loadings value differs from y-axis and there is no colour change for positive/negative values.

So here is also the improved version using ggcorrplot in case we want to use it

library(ggcorrplot)
library(mia)
library(ggtree)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- logNormCounts(tse)
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank="Phylum", prevalence=0.99, update.tree = TRUE)
# Achieve PCA reduction
tse <- runPCA(tse, name = "PCA", ncomponents = 2)
# Get the feature loadings
loadings_matrix <- attr(reducedDim(tse, "PCA"), "rotation")
# Plot feature loadings
ggcorrplot(loadings_matrix,
           type = "lower",
           method="circle",
           title="Feature loadings",
           colors = colours,
           ggtheme=theme_bw) + coord_flip()

antagomir · 2024-07-05T13:18:01Z

I would not combine the two axes like in the first plot. PC1 and PC2 we wouldn't compare directly in most cases. The original ggcorr plot looked better in fact.

A related idea, how about plotting just one axis at a time (as heatmap), sorting the values in increasing order, and then using colors to illustrate the increasing values (instead of bars)?

Probably the standard barplot / screeplot would be useful to include as option as well.

Note: often the user is interested on just a few top features (e.g. 5-10). This might help to find additional complementary options, too.

To get fwd, can we create one function that has the more standard barplot, and one heatmap version as options?

ElySeraidarian · 2024-07-08T12:25:33Z

I will open a PR tomorrow with the function that would include every plotting method we talked about and should be operational at least for PCA (waiting for other methods to have same way to retrieve loadings).

Here is the example I made based on your returns.

library(mia)
library(ggtree)
library(scater)
library(patchwork)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- logNormCounts(tse)
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank="Phylum", prevalence=0.99, update.tree = TRUE)
# Achieve PCA reduction
tse <- runPCA(tse, name = "PCA", ncomponents = 2)
# Get the feature loadings
loadings_matrix <- attr(reducedDim(tse, "PCA"), "rotation")
df <- attr(reducedDim(tse, "PCA"), "rotation") |> as.data.frame()
df[["Feature"]] <- getTaxonomyLabels(tse)[ match(rownames(df), rownames(tse)) ]
df <- df[ order(abs(df[["PC1"]]), decreasing = TRUE)[1:10], ]
df <- df[ order(df[["PC1"]]), ]
df[["Feature"]] <- factor(df[["Feature"]], levels = df[["Feature"]])

df1 <- data.frame(PC1 = round(df$PC1,2), Feature = df$Feature)
df2 <- data.frame(PC2 = round(df$PC2,2), Feature = df$Feature)


# Plot feature loadings
p1 <- ggplot(df1, aes(x = "PC1", y = Feature, label = PC1))  +
  geom_point(aes(col = PC1), size=12) +
  scale_color_continuous(type = "gradient", limits = c(-1,1)) +
  geom_text(color="white", size=4) + 
  labs(title="Feature Loadings Plot") 
  
p2 <- ggplot(df2, aes(x = "PC2", y = Feature, label = PC2))  +
    geom_point(aes(col = PC2), size=12) +
    scale_color_continuous(type = "gradient", limits=c(-1,1)) +
    geom_text(color="white", size=4) +
  labs(title="Feature Loadings Plot")

p1 + p2

antagomir · 2024-07-08T20:52:39Z

This could

support loading matrices directly (numeric matrix of features x components); if it also supports TreeSE that is a bonus
color scale should be two-color scale (negative values blue, positive red); to me it could look better if the colors were shown by squares rather than circles but this is a minor detail and if the plotting function works well otherwise, it should be relatively straightfwd to test alternative visualizations.

ElySeraidarian · 2024-07-09T08:04:11Z

This one should be finally good

library(mia)
library(ggtree)
library(scater)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns
tse <- logNormCounts(tse)
# Agglomerate to keep only high prevalence features
tse <- agglomerateByPrevalence(tse, rank="Phylum", prevalence=0.99, update.tree = TRUE)
# Achieve PCA reduction
tse <- runPCA(tse, name = "PCA", ncomponents = 2)
# Get the feature loadings
loadings_matrix <- attr(reducedDim(tse, "PCA"), "rotation")
df <- attr(reducedDim(tse, "PCA"), "rotation") |> as.data.frame()
df[["Feature"]] <- getTaxonomyLabels(tse)[ match(rownames(df), rownames(tse)) ]
df <- df[ order(abs(df[["PC1"]]), decreasing = TRUE)[1:10], ]
df <- df[ order(df[["PC1"]]), ]
df[["Feature"]] <- factor(df[["Feature"]], levels = df[["Feature"]])

df1 <- data.frame(PC1 = round(df$PC1,2), Feature = df$Feature)
df2 <- data.frame(PC2 = round(df$PC2,2), Feature = df$Feature)


# Plot feature loadings
p1 <- ggplot(df1, aes(x = "PC1", y = Feature, label = PC1))  +
  geom_point(aes(fill = PC1), size=15, shape = 22) +
  scale_fill_gradient2(limits = c(-1,1)) +
  geom_text(color="black", size=4) + 
  labs(title="Feature Loadings Plot") 
  
p2 <- ggplot(df2, aes(x = "PC2", y = Feature, label = PC2))  +
    geom_point(aes(fill = PC2), size=15, shape = 22) +
    scale_fill_gradient2(limits=c(-1,1)) +
    geom_text(color="black", size=4) +
  labs(title="Feature Loadings Plot")

p1 + p2

antagomir · 2024-07-09T09:00:53Z

Great!

Can you prepare an actual miaViz function that takes care of this?

TuomasBorman · 2024-07-11T13:38:44Z

microbiome/mia#604

antagomir · 2024-08-05T20:49:53Z

@ElySeraidarian can you close this issue as it becomes ready?

TuomasBorman added this to miaverse finalization Oct 5, 2023

TuomasBorman mentioned this issue Jul 3, 2024

Compute and plot feature loadings in ordination methods microbiome/mia#596

Closed

ElySeraidarian self-assigned this Jul 4, 2024

ElySeraidarian mentioned this issue Jul 9, 2024

Plot feature loadings function #135

Merged

ElySeraidarian closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plotRDA, plot loadings #92

plotRDA, plot loadings #92

TuomasBorman commented Sep 7, 2023

TuomasBorman commented Sep 7, 2023

ElySeraidarian commented Jul 4, 2024 •

edited

Loading

antagomir commented Jul 4, 2024

TuomasBorman commented Jul 4, 2024

antagomir commented Jul 4, 2024 •

edited

Loading

ElySeraidarian commented Jul 5, 2024

antagomir commented Jul 5, 2024

ElySeraidarian commented Jul 8, 2024

antagomir commented Jul 8, 2024

ElySeraidarian commented Jul 9, 2024

antagomir commented Jul 9, 2024

TuomasBorman commented Jul 11, 2024

antagomir commented Aug 5, 2024

plotRDA, plot loadings #92

plotRDA, plot loadings #92

Comments

TuomasBorman commented Sep 7, 2023

TuomasBorman commented Sep 7, 2023

ElySeraidarian commented Jul 4, 2024 • edited Loading

antagomir commented Jul 4, 2024

TuomasBorman commented Jul 4, 2024

antagomir commented Jul 4, 2024 • edited Loading

ElySeraidarian commented Jul 5, 2024

antagomir commented Jul 5, 2024

ElySeraidarian commented Jul 8, 2024

antagomir commented Jul 8, 2024

ElySeraidarian commented Jul 9, 2024

antagomir commented Jul 9, 2024

TuomasBorman commented Jul 11, 2024

antagomir commented Aug 5, 2024

ElySeraidarian commented Jul 4, 2024 •

edited

Loading

antagomir commented Jul 4, 2024 •

edited

Loading