scGALA

scGala: Graph Link Prediction Based Cell Alignment for Comprehensive Data Integration

For detailed instructions, comprehensive documentation, and helpful tutorials, please visit:

https://scgala.readthedocs.io/en/latest/intro.html

Overview

Installation

Step 1: Create a conda environment for scGALA

conda create -n scGALA python=3.10 -y

conda activate scGALA

Step 2: Install Pytorch as described in its official documentation. Choose the platform and accelerator (GPU/CPU) accordingly to avoid common dependency issues. Currently the DGL package requires Pytorch <= 2.4.0.

A note regarding DGL for required package PyGCL and PyG

Currently the DGL team maintains two versions, dgl for CPU support and dgl-cu*** for CUDA support. Since pip treats them as different packages, it is hard for PyGCL to check for the version requirement of dgl. They have removed such dependency checks for dgl in their setup configuration and require the users to install a proper version by themselves. It is the same with required Additional Libraries in PyG, please install the optional additional dependencies accordingly after install scGALA.

# Pytorch example, choose the cuda version accordingly
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# Install scGALA
pip install scGALA
# Example for DGL and PyG additional dependencies. Please read the note and install them based on your actual hardware.
# DGL
pip install  dgl -f https://data.dgl.ai/wheels/torch-2.4/cu121/repo.html
# PyG
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu121.html

Usage:

For the core function, which is the cell alignment in scGALA, simple run:

from scGALA import get_alignments

# You can get the edge probability matrix for one line
alignment_matrix = get_alignments(adata1=adata1,adata2=adata2)

# To get the anchor index for two datasets
anchor_index1, anchor_index2 = alignments_matrix.nonzero()

# The anchor cells are easy to obtain by
anchor_cell1 = adata1[anchor_index1]
anchor_cell2 = adata2[anchor_index1]

We also provide convenient APIs for enhancing Seurat-based anchors, imputing spatial transcriptomics, generating cross-modality data, and other useful features. Please refer to Tutorials and APIs for detailed walkthroughs.

Example Data

All example data used in the Tutorials can be found in Figshare. The data used in batch correciton tutorial can be found in Figshare.

Tutorials

Integrate into existing methods (Harmonization Pipeline and Universal Booster).

scGALA is designed to easily integrate into existing methods that employ cell alignments. The integration can be done in two modes: Module Replacement or External Reference, depending on the working strategy of the target method.

For methods under develop or run in an end-to-end way, then Module Replacement is the strategy to choose. Identify the Cell Alignment module (key words to look for: MNN, Alignment, CCA, Anchor, Correspondence) and replace it with scGALA as in the Usage.

Tutorial with INSCT as example: Module Replacement Tutorial Based On INSCT (Batch Correction). We presented the comparison experiment between scGALA-enhanced INSCT Supervised and original INSCT Supervised. To facilitate the evaluation, we use scIB to compute core metrics of batch correction.

For methods with clear procedure steps, we recommend the External Reference strategy, as this needs least efforts. In this mode, we don't replace the alignment module, instead, we enhance the intermediate cell alignment results given by the original method.

Tutorial with Seurat as example: External Reference Tutorial Based On Seurat (Label Transfer). We presented the comparison experiment between scGALA-enhanced Seurat and original Seurat. We proved APIs to efficiently enhance seurat-based anchors and compute anchor scores required by Seurat.

More tutorials are provided to demonstrate Multiomcs Integration based on scGALA-enhanced Seurat and Spatial Alignment based on scGALA-enhanced STAligner.

Advanced Multiomics Functionalities

Multiplet-omics Integration

scGALA introduces a multiplet-omics integration strategy that bridges disjoint doublet datasets, such as RNA-ATAC and RNA-ADT, to computationally construct a triplet-omics dataset (RNA-ATAC-ADT), thus bypassing the need for specialized triple-modal sequencing protocols while maintaining coherence across modalities.

Tutorial: Multiplet-omics Integration with scGALA. We demonstrated how scGALA can be used for multiplet-omics integration, specifically integrating RNA+ATAC and RNA+ADT datasets through their shared RNA modality.

Cross-modality Imputation and Generation

scGALA enables cross-modality data generation through a specialized Graph Attention Network framework. This allows for predicting RNA expression profiles from chromatin accessibility (ATAC-seq) data, effectively creating multimodal profiles from single-modality measurements.

Tutorial: Cross-modality Imputation with scGALA. We demonstrate how to use scGALA to generate gene expression (RNA) profiles from ATAC-seq data using cell-cell alignments as guiding information.

Spatial Transcriptomics Enhancement

scGALA offers functionality to impute spatial transcriptomics data with the help of a reference scRNA dataset. This addresses a major limitation of spatial technologies, which typically measure only a few hundred genes compared to thousands in scRNA-seq.

Tutorial: Spatial Transcriptomics Imputation with scGALA. We show how to enhance spatially resolved transcriptomics by imputing unmeasured genes using a reference scRNA-seq dataset while preserving spatial context.

APIs

scGALA offers a comprehensive set of functions for various single-cell data integration tasks. Below are the key APIs organized by their purpose.

Core Alignment Functions

`get_alignments(data1_dir=None, data2_dir=None, adata1=None, adata2=None, out_dim=32, ...)`

The main function for cell alignment between two datasets.

Key Parameters:

adata1, adata2: AnnData objects containing the datasets to align
out_dim: Dimension of latent features (default: 32)
k: Number of neighbors for initial MNN search (default: 20)
min_value: Minimum alignment score threshold (default: 0.9)
lamb: Hyperparameter for score-based alignment (default: 0.3)
spatial: Whether to use spatial information in alignment (default: False)

Returns: Matrix of alignment probabilities between cells in the two datasets

`find_mutual_nn_new(data1, data2, k1, k2, ...)`

Enhanced mutual nearest neighbors finding with graph learning.

Key Parameters:

data1, data2: Input datasets
k1, k2: Number of neighbors to consider in each dataset

Returns: Lists of mutual indices between datasets

Seurat Integration Utilities

`mod_seurat_anchors(anchors_ori, adata1, adata2, min_value=0.8, lamb=0.3, ...)`

Enhance Seurat anchors using scGALA's graph-based alignment.

Key Parameters:

anchors_ori: Path to CSV file with original anchors
adata1, adata2: Paths to or AnnData objects for the datasets
min_value: Minimum alignment score threshold (default: 0.8)
lamb: Hyperparameter for anchor refinement (default: 0.3)

Returns: Enhanced alignment matrix

`compute_anchor_score(adata1, adata2, mnn1, mnn2)`

Calculate anchor scores for pairs of cells, useful for downstream integration tasks.

Key Parameters:

adata1, adata2: AnnData objects for the datasets
mnn1, mnn2: Lists of indices representing aligned cell pairs

Returns: Array of anchor scores

Batch Correction Integration (the modified function for existing methods)

`mnn_tnn(ds1, ds2, names1, names2, knn=20, ...)`

Replace TNN (INSCT) alignment with scGALA-enhanced alignment.

Key Parameters:

ds1, ds2: Input datasets
names1, names2: Cell names for each dataset
knn: Number of neighbors for alignment (default: 20)
min_value: Minimum alignment score threshold (default: 0.8)

Returns: Aligned indices between datasets

`get_match_scanorama(data1, data2, ...)`

Enhanced alignment for Scanorama integration method.

Key Parameters:

data1, data2: Input datasets
matches: Optional pre-computed matches

Returns: Aligned indices between datasets

`mnn_scDML(ds1, ds2, names1, names2, knn=20, ...)`

Enhanced alignment for scDML batch correction method.

Key Parameters:

ds1, ds2: Input datasets
names1, names2: Cell names for each dataset
knn: Number of neighbors (default: 20)

Returns: Aligned indices between datasets

Spatial Transcriptomics Tools

`mnn_tnn_spatial(ds1, ds2, spatial1, spatial2, names1, names2, ...)`

Spatial-aware version of mnn_tnn that incorporates spatial coordinates.

Key Parameters:

ds1, ds2: Input expression datasets
spatial1, spatial2: Spatial coordinate information
names1, names2: Cell names for each dataset
min_value: Minimum alignment score threshold (default: 0.9)

Returns: Aligned indices between spatial datasets

`GNNImputer` (class)

Neural network model for imputing gene expression in spatial data.

Key Parameters in constructor:

num_features: Number of input features
n_matching_genes: Number of genes shared between datasets
hidden_channels: Size of hidden layers
num_layers: Number of GNN layers (default: 3)
layer_type: Type of GNN layer to use (default: 'GAT')

Usage: Used through MyDataModule_OneStage in the spatial imputation workflow

Data Handling Utilities

`split_data_unevenly(adata, train_ratio=0.7, group_key='cell_type')`

Split datasets with controlled imbalance for robust testing.

Key Parameters:

adata: Input AnnData object
train_ratio: Overall ratio for training set (default: 0.7)
group_key: Column in adata.obs for splitting (default: 'cell_type')

Returns: Two AnnData objects (train and test)

`simulate_batch_effect(data, batch_effect_strength=0.3, noise_strength=0.3)`

Add synthetic batch effects for benchmarking integration methods.

Key Parameters:

data: Input data matrix
batch_effect_strength: Strength of batch effect (default: 0.3)
noise_strength: Strength of random noise (default: 0.3)

Returns: Data with simulated batch effects

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
images		images
scGALA		scGALA
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scGALA

Overview

Installation

Usage:

Example Data

Tutorials

Integrate into existing methods (Harmonization Pipeline and Universal Booster).

Advanced Multiomics Functionalities

Multiplet-omics Integration

Cross-modality Imputation and Generation

Spatial Transcriptomics Enhancement

APIs

Core Alignment Functions

`get_alignments(data1_dir=None, data2_dir=None, adata1=None, adata2=None, out_dim=32, ...)`

`find_mutual_nn_new(data1, data2, k1, k2, ...)`

Seurat Integration Utilities

`mod_seurat_anchors(anchors_ori, adata1, adata2, min_value=0.8, lamb=0.3, ...)`

`compute_anchor_score(adata1, adata2, mnn1, mnn2)`

Batch Correction Integration (the modified function for existing methods)

`mnn_tnn(ds1, ds2, names1, names2, knn=20, ...)`

`get_match_scanorama(data1, data2, ...)`

`mnn_scDML(ds1, ds2, names1, names2, knn=20, ...)`

Spatial Transcriptomics Tools

`mnn_tnn_spatial(ds1, ds2, spatial1, spatial2, names1, names2, ...)`

`GNNImputer` (class)

Data Handling Utilities

`split_data_unevenly(adata, train_ratio=0.7, group_key='cell_type')`

`simulate_batch_effect(data, batch_effect_strength=0.3, noise_strength=0.3)`

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mcgilldinglab/scGALA

Folders and files

Latest commit

History

Repository files navigation

scGALA

Overview

Installation

Usage:

Example Data

Tutorials

Integrate into existing methods (Harmonization Pipeline and Universal Booster).

Advanced Multiomics Functionalities

Multiplet-omics Integration

Cross-modality Imputation and Generation

Spatial Transcriptomics Enhancement

APIs

Core Alignment Functions

get_alignments(data1_dir=None, data2_dir=None, adata1=None, adata2=None, out_dim=32, ...)

find_mutual_nn_new(data1, data2, k1, k2, ...)

Seurat Integration Utilities

mod_seurat_anchors(anchors_ori, adata1, adata2, min_value=0.8, lamb=0.3, ...)

compute_anchor_score(adata1, adata2, mnn1, mnn2)

Batch Correction Integration (the modified function for existing methods)

mnn_tnn(ds1, ds2, names1, names2, knn=20, ...)

get_match_scanorama(data1, data2, ...)

mnn_scDML(ds1, ds2, names1, names2, knn=20, ...)

Spatial Transcriptomics Tools

mnn_tnn_spatial(ds1, ds2, spatial1, spatial2, names1, names2, ...)

GNNImputer (class)

Data Handling Utilities

split_data_unevenly(adata, train_ratio=0.7, group_key='cell_type')

simulate_batch_effect(data, batch_effect_strength=0.3, noise_strength=0.3)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`get_alignments(data1_dir=None, data2_dir=None, adata1=None, adata2=None, out_dim=32, ...)`

`find_mutual_nn_new(data1, data2, k1, k2, ...)`

`mod_seurat_anchors(anchors_ori, adata1, adata2, min_value=0.8, lamb=0.3, ...)`

`compute_anchor_score(adata1, adata2, mnn1, mnn2)`

`mnn_tnn(ds1, ds2, names1, names2, knn=20, ...)`

`get_match_scanorama(data1, data2, ...)`

`mnn_scDML(ds1, ds2, names1, names2, knn=20, ...)`

`mnn_tnn_spatial(ds1, ds2, spatial1, spatial2, names1, names2, ...)`

`GNNImputer` (class)

`split_data_unevenly(adata, train_ratio=0.7, group_key='cell_type')`

`simulate_batch_effect(data, batch_effect_strength=0.3, noise_strength=0.3)`

Packages