GitHub

DISHIC method has been published on DOI:10.1109/BIBM62325.2024.10821719

Here is an example to run DISHIC on GSE80006 chromosome 19.

# data folder root path
file_path <- "./data"
# group1 data subfolder name
file_name1 <- "chr19-fold"
# group2 data subfolder name
file_name2 <- "chr19-ori"
#DISHIC code folder path
code_path <- "./"
#scHiCNorm feature folder path
feature_path <- "./feature"
# cell-level covariate matrix, nrows is the cell number and ncols is feature number
cell_feature <- NULL

# chromosomes to be analyzed
chr <- 19
# number of nodes for parallel computing
cores <- 40
# data binned resolution
bin_size <- 200000
# max genomic distance between analyzed bin-pairs
limit_size <- 10000000
 #if the file is too large, group the files into several groups with group_size bin-pairs and calculate them sequentially.
group_size <- 25000

DISHIC(file_path, feature_path, code_path, chr, cores, bin_size, limit_size, group_size)

This is the detail information.

1. Input data

# data folder root path
file_path <- "./data"
# group1 data subfolder name
file_name1 <- "chr19-fold"
# group2 data subfolder name
file_name2 <- "chr19-ori"

The data folder file_path should contain two subfolders file_name1 and file_name2 representing the two control groups for analysis of variance, such as chr19-fold and chr19-ori here.

File Structure

In each folder, the scHi-C data files for all samples are included. Each sample is stored in a separate file, and each file has three columns:

V1: Chromosome bin1 start position (divided by the resolution)
V2: Chromosome bin2 start position (divided by the resolution)
V3: Interaction values between V1 and V2 bins

For example, in the file GSM2109888_1_oocyte_NSN.200kb.txt, the data is structured as follows:

V1	V2	V3
16	17	0
16	18	0
17	18	13
16	19	0
17	19	1
……	……	……

2. Code

code_path <- "./"

Folder containing the DISHIC code files:

DISHIC.R: Main code for the method.
data_process.R: Code for preprocessing input scHi-C data and features.
model_class.R: DISHIC model.
solve_regression.R: Regression function used to update model parameters.
zinb_initialize.R: Function to initialize model parameters.
zinb_optimize.R: Function to update model parameters.

3. Feature

#scHiCNorm feature folder path
feature_path <- "./feature"
# cell-level covariate matrix
cell_feature <- NULL

The feature_path folder contains genome-level features. By default, DISHIC uses scHiCNorm features as genome-level features. Features for other species or resolutions not provided can be generated from their website.
The cell_feature is a matrix of cell-level covariates, with the number of cells in the row and the columns as features. By default it is NULL, but users can define it and the method will detect it.

4. Other settings

# chromosomes to be analyzed
chr <- 19
# number of nodes for parallel computing
cores <- 40
# data binned resolution
bin_size <- 200000
# max genomic distance between analyzed bin-pairs
limit_size <- 10000000
 #if the file is too large, group the files into several groups with group_size bin-pairs and calculate them sequentially.
group_size <- 25000

DISHIC analyzes intra-interactions in certain chromosome chr.
DISHIC supports multi-core parallelism on CPU, cores represents the number of parallel cores.
bin_size represents the resolution of input scHi-C data. For example, 200000 equals to resolution of 200kb.
Due to the lack of analytical value in interactions between bin-pairs that are too far from the diagonal, the limit_size parameter restricts the maximum genomic distance between the analyzed genome pairs. For example, if we set limit_size=10000000, DISHIC will only analyze bin-pairs within 10Mbp.
If the scHi-C data is too large to process, DISHIC can divide the files into several groups, each with group_size bin-pairs (rows), and calculate them sequentially.

5. Run

DISHIC(file_path, feature_path, code_path, chr, cores, bin_size, limit_size, group_size)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Input data

File Structure

2. Code

3. Feature

4. Other settings

5. Run

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
feature		feature
simulation		simulation
DISHIC.R		DISHIC.R
README.md		README.md
data_process.R		data_process.R
model_class.R		model_class.R
solve_regression.R		solve_regression.R
zinb_initialize.R		zinb_initialize.R
zinb_optimize.R		zinb_optimize.R

zhaoyichao777/DISHIC

Folders and files

Latest commit

History

Repository files navigation

1. Input data

File Structure

2. Code

3. Feature

4. Other settings

5. Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages