subsample sce object based on factor in colData #56

baj12 · 2023-02-19T13:18:17Z

I would like to sub-sample a singleCellExperiment object based on a factorial in colData.

I have a singleCellExperiment object:

> sce
# A SingleCellExperiment-tibble abstraction: 13,268,769 × 6
# Features=42 | Assays=exprs

with some colData:

> colData(sce)
DataFrame with 13268769 rows and 5 columns
         sample_id condition patient_id    label1 cluster_id
          <factor>  <factor>   <factor> <numeric>   <factor>
1            D929I       Ref      D929I        36        302
2            D929I       Ref      D929I        29        285
3            D929I       Ref      D929I        50        103
4            D929I       Ref      D929I        36        302
5            D929I       Ref      D929I        51        181
...            ...       ...        ...       ...        ...
13268765     D232I       Ref      D232I        51        201
13268766     D232I       Ref      D232I        28        304
13268767     D232I       Ref      D232I        50        5  
13268768     D232I       Ref      D232I        51        184
13268769     D232I       Ref      D232I        18        364

I would like to subsample based on the cluster_id column such that I have max X (500) events of each cluster.

I can get the selection of cells using the following code:

> sce %>% group_by(cluster_id) %>% slice_sample(n=500) %>% ungroup()
tidySingleCellExperiment says: A data frame is returned for independent data analysis.
# A tibble: 200,000 × 6
   .cell    sample_id condition patient_id label1 cluster_id
   <chr>    <fct>     <fct>     <fct>       <dbl> <fct>     
 1 4002318  D0749I    Ref       D0749I         60 1         
 2 10259368 D590I     Ref       D590I          60 1         
 3 12615676 D232I     Ref       D232I          25 1         
 4 6765422  D694I     Ref       D694I          25 1         
 5 9415336  D0553I    Ref       D0553I         60 1         
 6 7245671  D694I     Ref       D694I          25 1         
 7 7177144  D694I     Ref       D694I          42 1         
 8 7002069  D694I     Ref       D694I          49 1         
 9 8732040  D615I     Ref       D615I          60 1         
10 3989255  D0749I    Ref       D0749I         60 1         
# … with 199,990 more rows
# ℹ Use `print(n = ...)` to see more rows

But I don't know how I would use this to filter the original singleCellExperiment object.

Could you please give me a pointer?

Thanks

The text was updated successfully, but these errors were encountered:

stemangiola · 2023-03-06T03:11:44Z

sorry, this slipped into the cracks.

At the moment you can use

nest() |>
mutate(map(...)) |>
unnest()

In the future we might be able to add group_by while preserving the SingleCellExperiment. But we don't have plans yet. (Pull requests are always welcome, though!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subsample sce object based on factor in colData #56

subsample sce object based on factor in colData #56

baj12 commented Feb 19, 2023

stemangiola commented Mar 6, 2023

subsample sce object based on factor in colData #56

subsample sce object based on factor in colData #56

Comments

baj12 commented Feb 19, 2023

stemangiola commented Mar 6, 2023