10000 Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches · Issue #376 · owkin/PyDESeq2 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Willt1128 opened this issue Mar 17, 2025 · 3 comments

Comments

@Willt1128
Copy link
Willt1128 commented Mar 17, 2025 8000

Hello, I am attempting to use PyDESeq2 with scRNA seq data from 4 separate sequencing batches. I am attempting to do comparisons between individual cell types (which have 3 replicates per cell type per batch, derived by pseudobulking data from 3 separate organoids per batch) across batches. However, I also want to do batch correction. I am finding that I cannot do both batch correction and statistical comparisons between different cell types across batches.

From other forum posts regarding DESeq2 and PyDESeq2, I understand that batch effects must be accounted for by including batch in one's design (e.g., design = "~batch + cell_type"). I also understand that batch correction cannot be done before running PyDESeq2 because it requires the raw counts data as an input.

One thought I had was to make a separate column in my adata.obs instance called comparison_group, which combines both batch and cell type (e.g., for astrocytes in batch D250WT, "D250WT_astrocytes"). Then I could either run PyDESeq2 with design = "~comparison_group" or design = "~batch + comparison_group". Unsurprisingly, using design = "~batch + comparison_group" produces a Singular Matrix error, preventing the DESeq from being completed. Using design = "~comparison_group" allows the DESeq to run, but I am concerned that this will not have a comparable effect to simply modeling 'batch' as a covariate by including it in the design, given that there are many different cell types within each batch.

I was also considering subsetting adata prior to DESeq to exclusively include pseudobulked samples belonging to one cell type, then iteratively performing DESeq for each cell type, but I don't believe this is a good solution because its batch correction's effectiveness is dependent upon the batch effect being homogenous across cell types.

Does anyone know how I can do an effective batch correction while also doing a DESeq (including statistical comparisons) between specific cell types (i.e., subsets of each batch) across different batches?

Please let me know if anything needs clarification. In case it is helpful, I have included an example screenshot of my metadata for 2 batches. Thank you very much.

Image
@BorisMuzellec
Copy link
Collaborator

Hi @Willt1128, sorry for the late reply.

I'm not sure I understand the issue. Why couldn't we use design= ~batch + adjusted_cell_type? This would allow measuring differential expression between cell types while also taking batches into account, and from the snapshot you provided it seems like the design matrix wouldn't be singular.

@Willt1128
Copy link
Author

Hi @BorisMuzellec, thank you very much for the response.

Using design = ~batch + adjusted_cell_type would not allow for DESeq comparisons between all comparison groups, as far as I am aware. If I used this design, would I be able to produce fc, log2fc, p values, etc. that compare D75WT astroglia to D250WT astroglia, for example? My apologies if I am missing something.

Please note that, ideally, I would want to run a single DESeq with batch as a covariate, as opposed to multiple iterations on cell subsets, so that batch effects are accounted for homogeneously across all comparisons. Please let me know if anything else needs clarification.

@BorisMuzellec
Copy link
Collaborator

Hi @Willt1128,

Indeed, design = ~batch + adjusted_cell_type would not allow you to compare D75WT astroglia to D250WT astroglia. To do this, you would need to use design = ~comparison_group, or something equivalent based on interaction terms (c.f. the DESeq2 vignette on that topic: https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions).

I'm not 100% sure but I think that design = ~comparison_group would allow you to take into account batch effects just fine. Although it could be a bit tricky if you're not used to numerical contrast vectors, you could then even test the global effect of batch across all cell types, or the effect of cell types across all batches, by manually setting your contrast coefficients as numpy array in DeseqStat's contrast argument.

Let me know if that helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0