The cfNOMe Toolkit

The following is an introduction into the bioinformatic tools used in the manuscript "cfNOMe A single assay for comprehensive epigenetic analyses of cell-free DNA". These tools are meant to be used on paired-end cytosine-converted (e.g. enzymatic cytosine conversion) cell-free DNA NGS data. In principle, these tools should be equally compatible with bisulfite-converted libraries.

1. CONTENTS

Included in this package are:

Software:

CalcWPSforCoordinate.sh
bash script for calculating the WPS given an alignment file and a chromosomal location.
BatchWPSfromCoordinateFile.sh
bash script for calculating the WPS in batch given an alignment file and a file containing any number of chromosomal locations (in shape 8:43540000-43550000).
CalcAverageWPS.sh
bash script for averaging and normalizing raw WPS data, especially when trying to merge signals from multiple locations (e.g. transcription start sites of multiple genes).
Call_peaks.sh
bash script for calling peaks - representing nucleosome or transcription factor binding positions - from raw WPS data.
Call_peaks.py
Python3 program being automatically called by Call_peaks.sh for performing the peak calling logic.
find_nearest_peak.py
Python3 program that takes two input files containing chromosomal peak locations and outputs the distance between peak calls in the two samples. Distances can be -500 to +500bp.
methylation_deconvolution.py
Python3 program that takes tab-separated methylation data as input (reference methylation and sample methylation are both needed) and calculates the most likely sample composition.

2. DEPENDENCIES (newer versions probably fine)

GNU parallel v20161222
samtools v1.7
Python v3.6.8
numpy v1.16.1
scipy v1.2.0
bismark v0.20.0
bedmap v2.4.26
sort-bed v2.4.26
bowtie2 v2.3.4.1

3. INSTALLATION

No special installation should be necessary. Just clone the repo onto your harddrive:

git clone https://github.com/FlorianErger/cfNOMe

and execute the scripts/programs from the command line. Usage instructions for each component of the toolkit are available using the --help command line parameter.

4. PREPARING INPUT DATA FOR YOUR OWN ANALYSES

To perform cfNOMe analyses on your own data, perform the following steps first:

Perform methylation sensitive alignment using bismark, e.g.
bismark --multicore * --output_dir /path/to/output/directory --genome /path/to/bismark/prepared/reference -1 /path/to/read1.fq.gz -2 /path/to/read2.fq.gz)

Perform methylation summary analysis on output .bam file from step 1, e.g.
bismark_methylation_extractor --parallel * --comprehensive --output /path/to/output/directory --merge_non_CpG --bedGraph --gzip /path/to/bismark.bam)
sort and index the bismark output .bam file from step 1, e.g.
samtools sort -@ * /path/to/input.bam > input_sorted.bam && samtools index -@ * input_sorted.bam)
Generate methylation summaries for loci of interest from bismark methylation data from step 2, e.g.
gunzip -c /path/to/bismark.bedGraph.gz | awk '{if(NR>1){print "chr"$1"\t"$2"\t"$3"\tid-"NR-1"\t"$4/100}}' | sort-bed - | bedmap --echo --delim '\t' --mean /path/to/CpG_loci_to_be_studied.bed - > average_methylation_at_loci_of_interest.bed)

*: enter number of parallel threads to be used

For a cfDNA dataset sequenced at ~30X, these steps should take about 8h, 1h, 1h and 1h, respectively, on a normal desktop PC (provided appropriate parallelization settings are used).

5. PERFORMING cfNOMe ANALYSES (DEMO DATA)

Included with this toolkit are demo datasets. For the Nucleosome footprinting tools, a sorted and indexed .bam file has been included, containing NGS reads on hg19 chr8:43540000-43550000. To perform analyses on this demo data, use the following commands:

To calculate raw WPS in an area of interest:
./CalcWPSforCoordinate.sh -i Example_data/Nucleosome_footprinting/NF_demo_data.bam -r 8:43546000-43548000 > Raw_WPS.txt

Optional: normalize the raw WPS (incurs information loss at both ends of the window):
./CalcAverageWPS.sh --keep true -i Raw_WPS.txt --num 1
Optional: call WPS peaks:
./Call_peaks.sh -i Raw_WPS.txt > Peak_calls.txt
Optional: compare peak calls (requires second sample):
python3 find_nearest_peak.py Peak_calls_from_sample_1.txt Peak_calls_from_sample_2.txt

The expected run time for each of these steps is less than 1 second on a normal desktop PC. For bigger analyses with larger alignment files and larger regions of interest (over several Mb), step a can be parallelized with the -p flag. Runtime for a 200Mb sequence chunk is estimated at ~1 hour for steps 1-4.

For methylation deconvolution analysis, we include the reference methylation levels at 7890 loci-of-interest from the publication by Moss et al., Nat Commun., 2018 for 25 reference tissues and for one of our healthy individuals. To perform analyses on this demo data, use the following commands:

calculate tissue contribution based on methylation levels:
python3 methylation_deconvolution.py Example_data/Methylation/Reference_tissue_methylation.bed Example_data/Methylation/Sample_methylation_levels.bed --ineq --verbose

The expected run time for a methylation deconvolution using these references and loci-of-interest is on the order of 10 seconds on a normal desktop PC.

Expected output files can be found in the respective subfolders of the Example_data directory.

6. PERFORMING BATCH WPS CALCULATIONS OVER MULTIPLE COORDINATES

For calculating the WPS at several specific coordinates (and optionally averaging them to a single value, e.g. for looking at the transcription start sites of several thousand genes at once):

Use coordinate file to calculate WPS over many different regions and output averaged result:
./BatchWPSfromCoordinateFile.sh --average true --keep true -i /path/to/your_sorted_and_indexed.bam -c Example_data/Nucleosome_footprinting/NF_TSS_coordinates_for_10k_genes+-5kb.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The cfNOMe Toolkit

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Example_data		Example_data
BatchWPSfromCoordinateFile.sh		BatchWPSfromCoordinateFile.sh
CalcAverageWPS.sh		CalcAverageWPS.sh
CalcWPSforCoordinate.sh		CalcWPSforCoordinate.sh
Call_peaks.py		Call_peaks.py
Call_peaks.sh		Call_peaks.sh
LICENSE		LICENSE
README.md		README.md
find_nearest_peak.py		find_nearest_peak.py
methylation_deconvolution.py		methylation_deconvolution.py

License

FlorianErger/cfNOMe

Folders and files

Latest commit

History

Repository files navigation

The cfNOMe Toolkit

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages