Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

Willt1128 · 2025-03-17T22:26:25Z

Hello, I am attempting to use PyDESeq2 with scRNA seq data from 4 separate sequencing batches. I am attempting to do comparisons between individual cell types (which have 3 replicates per cell type per batch, derived by pseudobulking data from 3 separate organoids per batch) across batches. However, I also want to do batch correction. I am finding that I cannot do both batch correction and statistical comparisons between different cell types across batches.

From other forum posts regarding DESeq2 and PyDESeq2, I understand that batch effects must be accounted for by including batch in one's design (e.g., design = "~batch + cell_type"). I also understand that batch correction cannot be done before running PyDESeq2 because it requires the raw counts data as an input.

One thought I had was to make a separate column in my adata.obs instance called comparison_group, which combines both batch and cell type (e.g., for astrocytes in batch D250WT, "D250WT_astrocytes"). Then I could either run PyDESeq2 with design = "~comparison_group" or design = "~batch + comparison_group". Unsurprisingly, using design = "~batch + comparison_group" produces a Singular Matrix error, preventing the DESeq from being completed. Using design = "~comparison_group" allows the DESeq to run, but I am concerned that this will not have a comparable effect to simply modeling 'batch' as a covariate by including it in the design, given that there are many different cell types within each batch.

I was also considering subsetting adata prior to DESeq to exclusively include pseudobulked samples belonging to one cell type, then iteratively performing DESeq for each cell type, but I don't believe this is a good solution because its batch correction's effectiveness is dependent upon the batch effect being homogenous across cell types.

Does anyone know how I can do an effective batch correction while also doing a DESeq (including statistical comparisons) between specific cell types (i.e., subsets of each batch) across different batches?

Please let me know if anything needs clarification. In case it is helpful, I have included an example screenshot of my metadata for 2 batches. Thank you very much.

BorisMuzellec · 2025-03-27T10:27:58Z

Hi @Willt1128, sorry for the late reply.

I'm not sure I understand the issue. Why couldn't we use design= ~batch + adjusted_cell_type? This would allow measuring differential expression between cell types while also taking batches into account, and from the snapshot you provided it seems like the design matrix wouldn't be singular.

Willt1128 · 2025-04-21T19:13:19Z

Hi @BorisMuzellec, thank you very much for the response.

Using design = ~batch + adjusted_cell_type would not allow for DESeq comparisons between all comparison groups, as far as I am aware. If I used this design, would I be able to produce fc, log2fc, p values, etc. that compare D75WT astroglia to D250WT astroglia, for example? My apologies if I am missing something.

Please note that, ideally, I would want to run a single DESeq with batch as a covariate, as opposed to multiple iterations on cell subsets, so that batch effects are accounted for homogeneously across all comparisons. Please let me know if anything else needs clarification.

BorisMuzellec · 2025-04-28T10:07:13Z

Hi @Willt1128,

Indeed, design = ~batch + adjusted_cell_type would not allow you to compare D75WT astroglia to D250WT astroglia. To do this, you would need to use design = ~comparison_group, or something equivalent based on interaction terms (c.f. the DESeq2 vignette on that topic: https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions).

I'm not 100% sure but I think that design = ~comparison_group would allow you to take into account batch effects just fine. Although it could be a bit tricky if you're not used to numerical contrast vectors, you could then even test the global effect of batch across all cell types, or the effect of cell types across all batches, by manually setting your contrast coefficients as numpy array in DeseqStat's contrast argument.

Let me know if that helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

Comments