The repository contains the code for the computational studies in The fibro-adipogenic progenitor APOD+DCN+LUM+ cell population in aggressive carcinomas.
An adaptive method for finding co-expression signatures in single-cell/bulk RNA-seq datasets.
- Overview
- Tutorials
- Seed selection
- Description of original attractor algorithm (fixed exponent parameter)
An adaptive version of attractor algorithm.
cafr::findAttractor
finds a converged attractor based on the seed gene provided. The findAttractor.adaptive
gradually decreased the exponent parameter to maximize the strength of the Nth-ranked genes in the converged attractor.
The attractor algorithm was first proposed for identifying co-expression signatures from bulk expression values in samples [Ref.1]. A detailed description and application of the attractor algorithm on single-cell RNA-seq data can be found in [Ref.2]. Users can refer to our recent manuscript [Ref.3] for a detailed description and application of adaptive attractor algorithm.
Please see Seed selection for more information about the seed.
source("./findAttractor.adaptive.r")
data <- as.matrix(data) # a normalized expression matrix with genes in the rows, cells in the columns.
seed <- "LUM"
attr <- findAttractor.adaptive(data, seed)
R script for generating Supplementary Table 1 in our recent manuscript [link].
## install cafr package
devtools::install_github("weiyi-bitw/cafr")
## load metadata
# Data website: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE129363
metadata <- data.table::fread("./GSE129363/raw/GSE129363_Discovery_Cohort_CellAnnotation.txt")
## normalize the count matrix
library("Seurat")
adata <- Read10X("./GSE129363/raw") #including barcodes.tsv, genes.tsv, matrix.mtx
adata <- CreateSeuratObject(counts = adata[, metadata$CellID])
adata <- NormalizeData(adata)
data_all <- adata[["RNA"]]@data; rm(adata); gc()
## select one sample
sampleID <- "Ate-05-SAT"
data <- data_all[, metadata[which(metadata$SampleName == sampleID), ]$CellID]
## apply adaptive attractor algorithm
source("./findAttractor.adaptive.r")
data <- as.matrix(data) #normalized matrix for each sample
seed <- "LUM"
attr <- findAttractor.adaptive(data, seed, exponent.max = 5, exponent.min = 2, step.large = 1, step.small = 0.1, dominantThreshold = 0.2, seedTopN = 50, compareNth = 10, verbose = FALSE)
## results
head(attr$attractor.final)
attr$final.exponent
data
An expression matrix with genes in the rows, samples in the columns.
seed
Seed gene. Please see Seed selection for more information.
exponent.max
The maximum exponent of the mutual information, used to create weight vector for metagenes.
exponent.min
The minimum exponent of the mutual information, used to create weight vector for metagenes.
step.large
Decreasing step of the first round of scanning.
step.small
Decreasing step of the second round of scanning.
dominantThreshold
Threshold of dominance.
seedTopN
Threshold of divergence.
compareNth
It will maxmize the weight of the top Nth gene.
verbose
When TRUE, it will show the top 20 genes of the metagene in each iteration.
epsilon
Threshold of convergence.
minimize
Default FALSE,when minimize = TRUE, the algorithm will minize the difference of metagene[first.idx] and metagene[second.idx] instead of maxmizing the weight of the top Nth gene.
first.idx
Default first.idx = 1, the first gene of the metagene in each iteration.
second.idx
Default second.idx = 2, the second gene of the metagene in each iteration.
Users can choose some general markers as seed genes, such as:
Fibroblast | Pericyte | Macrophage | Endothelial | Mitotic |
---|---|---|---|---|
LUM | RGS5 | AIF1 | PECAM1 | TOP2A |
DCN |
If the dataset has many cells of one cell type (e.g. fibroblasts) and the attractor exponent (a
) is fixed, then identical attractors will be found using different general markers for one cell type, like DCN, LUM, and COL1A1.
Here, we applied the attractor finding algorithm using the findAttractor
function implemented in the cafr (v0.312) R package [Ref.1] with the general fibroblastic marker gene LUM and DCN as seed.
Detailed descriptions can be found in Ref.1.
The exponent (a
) is fixed. For the analysis of UMI based (e.g. 10x) and full-length-based (e.g. Smart-seq2) datasets, we would suggest using a = 3
and a = 5
, respectively.
## install cafr package
devtools::install_github("weiyi-bitw/cafr")
cafr::findAttractor(data, vec, a = 3, epsilon = 1e-7)