-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reintegration of subset causing very large file size #42
Comments
Hm. My best guess is that it's something to do with the set.seed(123456)
library(Matrix)
a <- rsparsematrix(50000, 20000, density=0.01)
out <- runPCA(a, rank=10, BSPARAM=IrlbaParam())
# Creating a low-rank reconstruction with the first 10 PCs.
lr <- LowRankMatrix(out$rotation, out$x)
## <20000 x 50000> matrix of class LowRankMatrix and type "double":
## [,1] [,2] [,3] ... [,49999]
## [1,] 0.0057135993 -0.0045652364 0.0039531449 . -1.254121e-03
## [2,] 0.0074395154 0.0046073647 0.0066427142 . -9.973139e-04
## [3,] 0.0043550788 -0.0018743069 0.0006405070 . -3.131222e-03
## [4,] -0.0003137060 0.0017982014 0.0009785337 . -5.224690e-03
## [5,] 0.0009093743 -0.0022579247 -0.0019486521 . -2.947885e-03
## ... . . . . .
## [19996,] -0.0046791718 -0.0073614454 0.0007148935 . -0.0024410180
## [19997,] -0.0048366051 0.0008190038 -0.0072412677 . -0.0004270998
## [19998,] 0.0034414458 -0.0026491549 0.0039459904 . -0.0038380154
## [19999,] 0.0051936588 -0.0004937389 0.0048754591 . -0.0020620816
## [20000,] 0.0007477738 0.0018794305 -0.0011703138 . -0.0019812603
## [,50000]
## [1,] -2.319264e-03
## [2,] 3.829845e-04
## [3,] -5.374638e-04
## [4,] 2.443269e-05
## [5,] -1.632262e-03
## ... .
## [19996,] -0.0019683190
## [19997,] -0.0013774037
## [19998,] 0.0005604048
## [19999,] 0.0019135394
## [20000,] 0.0009124538
DelayedArray::is_sparse(lr)
## [1] FALSE
object.size(lr)
## 5603648 bytes
However, if you're working with an analysis environment that doesn't understand So, the simplest approach is to just drop the reconstructed assay before it reenters the Seurat world. For various reasons (check out the relevant chapter of the OSCA book), I don't think that batch-corrected expression values are particularly reliable for per-gene inferences; I only use them to obtain a clustering in a common coordinate space, and then I go on to block on the batch in downstream differential analyses. If it's not that, then I don't know. The only "real" output of |
Thank you. Sounds likely to be an issue withe the SeuratWrapper implementation function, so I'll work on figuring how to run directly in batchelor as an SCE object. |
When I subset my seurat object to the cell-type of interest (based on original all-cell FastMNN clusters), I am successfully able to to run correctExperiments on a list of SingleCellExperiment with two of the three cell-types of interest.
But with my third cell type, this error is thrown:
Based on a manual inspection, I can see that all of the subsetted SingleCellExperiment objects in the "ObjectList_Cell_Type1" list have the same number of genes. Changing the workflow so that SplitObject is never run & instead correctExperiements is run on a merged object in SCE format throws the same error (e.g., correctExperiments(SCE_object_Cell_Type1, subset.row = VariableFeatures, batch = SCE_object_Cell_Type1$orig.ident). Including "checkDimnames = FALSE" in the conversion to SCE objects did not help either. I've tried reviewing the batchelor and SummarizedExperiment help & vignettes but can't quite tell what is going on. Any ideas or advice? Thanks again for your great package. For the cell type I have gotten through it, the dimensional reduction does great job handling biological variation and heterogeneity in sample composition. |
My best guess is that you are asking for features that are not present in the matrix for the third cell type. library(Matrix)
y <- rsparsematrix(10, 10, 0.1)
rownames(y) <- LETTERS[1:10]
y[letters[1:10],]
## Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) :
## invalid character indexing Seems like something to do with Seurat's preprocessing of the matrix. |
Thank you. Because I had normalized with SCTransform, there were some limits on how I could generate my variable feature list. So I had simply selected ~8 samples with the largest number of cells in each cell type subset, identified variable genes there, and took the overlap. This worked great with the first two cell types. But the above errors occurred with my third. This time I redid my normalization and variable feature finding with all cells/samples of the cell type in a different way and used that for the variable feature list for correctExperiments and it worked great. Appreciate all the help. |
Hello,
Thank you for this great tool.
I'm working to analyze multiple samples from different conditions and compare to published data, in total there are 240k cells over ~40 samples. Four broad cell types (epithelial, stromal, vascular, and immune) are present in many but not all samples (e.g., samples are not balanced by cell type). I have normalized in Seurat with SCTv2 and then integrated with FastMNN via the SeuratWrappers function, and FastMNN does a great job. Other methods have led to significant misclassification of cell types.
My issue and request for help is related to subsetting. I generated cell type specific Variable Feature using subsets from a few samples with 3-5k cells in that type, and then use than in rerunning FastMNN on a subsetted object as follows. Note that the DietSeurat function strips the Seurat object of all other assays/reductions, such as the original MNN. The issue is my output object is enormous now, 140gb for 55k cells. I tried with another cell type with only 5k cells and the object with 30gb. The starting integrated object is about 14gb, with all 240k cells. My guess is that this may be related to some samples having fewer than 50 cells after subsetting, but I am not sure. And note that a few samples only had 100-300 cells in the initial object creation so perhaps low cells/sample isn't the issue after all.
mnn_integrated_object <- readRDS(file = "xxx")
mnn_integrated_object <- DietSeurat(mnn_integrated_object, counts = TRUE, data = TRUE, assays = c("RNA", "SCT"))
gc()
subset_mnn <- subset(mnn_integrated_object, cells = Cell_Type_1_List)
DefaultAssay(subset_mnn) <- "SCT"
subset_mnn <- RunFastMNN(object.list = SplitObject(subset_mnn, split.by = "orig.ident"), features = VariableFeatures, assay = "SCT", verbose = TRUE)
subset_mnn <- RunUMAP(subset_mnn, reduction = "mnn", dims = 1:50, min.dist = 0.3)
subset_mnn <- FindNeighbors(subset_mnn, reduction = "mnn", dims = 1:50)
subset_mnn <- FindClusters(subset_mnn, resolution = 1.0)
Any idea what is going on? Of course I could just skip rerunning FastMNN on the subset but I suspect there is interesting biology present and would prefer to redo the PCA/Integration steps.
Many thanks!
The text was updated successfully, but these errors were encountered: