Cannoli

Big Data Genomics ADAM Pipe API wrappers for bioinformatics tools. Apache 2 licensed.

Hacking Cannoli

Install

JDK 1.8 or later, http://openjdk.java.net
Apache Maven 3.3.9 or later, http://maven.apache.org

To build

$ mvn install

Running Cannoli from the command line

To run the commands in this repository via the command line, use cannoli-submit.

Note the -- argument separator between Spark arguments and Cannoli command arguments.

$ ./bin/cannoli-submit --help

                              _ _ 
                             | (_)
   ___ __ _ _ __  _ __   ___ | |_ 
  / __/ _` | '_ \| '_ \ / _ \| | |
 | (_| (_| | | | | | | | (_) | | |
  \___\__,_|_| |_|_| |_|\___/|_|_|

Usage: cannoli-submit [<spark-args> --] <cannoli-args>

Choose one of the following commands:

CANNOLI
     bcftoolsMpileup : ADAM Pipe API wrapper for bcftools mpileup.
        bcftoolsNorm : ADAM Pipe API wrapper for bcftools norm.
   bedtoolsIntersect : ADAM Pipe API wrapper for Bedtools intersect.
              bowtie : ADAM Pipe API wrapper for Bowtie.
             bowtie2 : ADAM Pipe API wrapper for Bowtie 2.
                 bwa : ADAM Pipe API wrapper for BWA.
           freebayes : ADAM Pipe API wrapper for Freebayes.
            minimap2 : ADAM Pipe API wrapper for Minimap2.
     samtoolsMpileup : ADAM Pipe API wrapper for samtools mpileup.
              snpEff : ADAM Pipe API wrapper for SnpEff.
                 vep : ADAM Pipe API wrapper for Ensembl VEP.
         vtNormalize : ADAM Pipe API wrapper for vt normalize.

CANNOLI TOOLS
     interleaveFastq : Interleaves two FASTQ files.
         sampleReads : Sample reads from interleaved FASTQ format.

External commands wrapped by Cannoli should be installed to each executor node in the cluster

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwa \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -add_files

or can be run using Docker

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwa \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -use_docker \
    -image quay.io/ucsc_cgl/bwa:0.7.12--256539928ea162949d8a65ca5c79a72ef557ce7c \
    -add_files

or can be run using Singularity

$ ./bin/cannoli-submit \
    <spark-args>
    -- \
    bwa \
    sample.unaligned.fragments.adam \
    sample.bwa.hg38.alignments.adam \
    sample \
    -index hg38.fa \
    -sequence_dictionary hg38.dict \
    -fragments \
    -use_singularity \
    -image quay.io/ucsc_cgl/bwa:0.7.12--256539928ea162949d8a65ca5c79a72ef557ce7c \
    -add_files

Using Cannoli interactively from the shell

To run the Cannoli interactive shell, based on the ADAM shell, which in turn extends the Apache Spark shell, use cannoli-shell.

Wildcard import from ADAMContext to add implicit methods to SparkContext for loading reads, alignments, variants, genotypes, and features, such as sc.loadPairedFastqAsFragments below.

Wildcard import from Cannoli to add implicit methods for calling external commands to the genomic datasets loaded by ADAM, such as reads.alignWithBwa below.

$ ./bin/cannoli-shell

scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.ad
606E
am.rdd.ADAMContext._

scala> import org.bdgenomics.cannoli.Cannoli._
import org.bdgenomics.cannoli.Cannoli._

scala> import org.bdgenomics.cannoli.BwaArgs
import org.bdgenomics.cannoli.BwaArgs

scala> val args = new BwaArgs()
args: org.bdgenomics.cannoli.BwaArgs = org.bdgenomics.cannoli.BwaArgs@54234569

scala> args.indexPath = "hg38.fa"
args.indexPath: String = hg38.fa

scala> args.sample = "sample"
args.sample: String = sample

scala> val reads = sc.loadPairedFastqAsFragments("sample1.fq", "sample2.fq")
reads: org.bdgenomics.adam.rdd.fragment.FragmentRDD = RDDBoundFragmentRDD with 0 reference
sequences, 0 read groups, and 0 processing steps

scala> val alignments = reads.alignWithBwa(args)
alignments: org.bdgenomics.adam.rdd.read.AlignmentRecordRDD = RDDBoundAlignmentRecordRDD with
0 reference sequences, 0 read groups, and 0 processing steps

scala> alignments.saveAsParquet("sample.alignments.adam")

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
assembly		assembly
bin		bin
cli		cli
core		core
distribution		distribution
images		images
scripts		scripts
.gitignore		.gitignore
CHANGES.md		CHANGES.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
KEYS		KEYS
LICENSE.txt		LICENSE.txt
LICENSE_header.txt		LICENSE_header.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cannoli

Hacking Cannoli

Running Cannoli from the command line

Using Cannoli interactively from the shell

About

Releases

Packages

Languages

License

z395802609/cannoli

Folders and files

Latest commit

History

Repository files navigation

Cannoli

Hacking Cannoli

Running Cannoli from the command line

Using Cannoli interactively from the shell

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages