8000 GitHub - holstegelab/MotifScope: A tool for motif annotation and visualization in tandem repeats.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

holstegelab/MotifScope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MotifScope

A tool for motif annotation and visualization in tandem repeats.

Motifscope is also available online at https://motifscope.holstegelab.eu.

Installa 8B3F tion

  • To install with conda

    cd install/conda
    sh INSTALL.sh
    

    Conda will install an environment called motifscope, in which the necessary dependencies are installed.
    The conda environment is activated by executing conda activate motifscope in the shell.

    See the usage section on how to run MotifScope once the conda environment is activated.

  • To install with docker

    cd install/docker
    sh build.sh
    

    Docker will create an image called motifscope, in which the necessary dependencies are installed.

    An example command for running motifscope within this docker image is available in run_docker.sh.
    Please adapt the options in run_docker.sh to your specific use case.

    To run it (e.g. with example files in Motifscope/example folder):

       sh run_docker.sh path/to/example_sequence.fa path/to/example_population.txt output_prefix
    

Usage

  • For running MotifScope on a set of sequences (reads or assemblies):

    motifscope  [-i input.fa] [-mink 2] [-maxk 10] [-o output.prefix]

  • To annotate sequences with class labels, one can use the -p option to provide an annotation file.

    motifscope [-i input.fa] [-mink 2] [-maxk 10] [-p classes.txt] [-o output.prefix]

    The class information will be shown as a separate color-coded column in the figure.

    • The header of the sequences in input.fa should start with >sample#hap_number#, for example, for HG002, it could start with >HG002#1# .
    • The class annotation file classes.txt should be a tab separated file with the first column being the sample ids and the second column being the sample class. E.g. HG002 EUR
    • The class annotation file can contain a header, which should read sample <class_name>. The label of the second column <class_name> can be adapted, and will be shown in the figure. When there is no header, the default is 'population'.

  • To disable sequence clustering and the dendrogram (e.g. in case of a single sequence), use the -c option:

    motifscope -c False [-i input.fa] [-mink 2] [-maxk 10] [-o output.prefix]

  • To run multiple sequence alignment on the compressed representation of the sequence, set -msa to POAMotif (aligns complete motifs) or POANucleotide (aligns nucleotides).

  • To guide the algorithm with a set of known motifs, provide the motifs with -motifs motifs.txt. The motif file motifs.txt should contain the motifs separated with a tab.

  • To use random categorical colors for motifs, set -e to random. To project motifs onto a color scale, set -e to UMAP or MDS for dimension reduction based on motif similarities.

  • To characterize motif composition without generating a figure, set -figure to False.

  • To use the reverse complement of the input fasta, set -reverse to True.

Output

  • The repeat compositions are output in a fasta file. For example,
>HG002#2#JAHKSD010000034.1:9910981-9913041/rc
G1 A1 G1 C1 A2 G1 A1 C1 T1 C1 T1 G1 T3 C1 A2 AAAAG12 A1 AAAAG1 C1 A1 T1 G1 T2 C1 T1 A3 G1 A1 G1

The motifs are separated by spaces. Each string represents a motif, and the following number indicates how many consecutive copies of that motif occur.

  • The motif summary per sequence is output in a tab-separated file. The first column is the sequence header, the second column is the motif, the third column is the amount of sequence covered by the motif, and the fourth column is the count of the motif.

About

A tool for motif annotation and visualization in tandem repeats.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0