-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Target annotation using GTF/GFF3 #311
Comments
Thanks for the feedback. GFF/GTF/GFF3 support is pretty simplistic, all handled by the same parser, and relatively untested. On my GFF3 test files the parser picks up gene names correctly, but I'll need to modify it to handle the GTF syntax for specifying gene names and ensure the names are carried over to target labels. |
The problem is in this line: Line 47 in e16caae
Judging from the links in the Maybe a solution would be to have |
I have come up with a possible solution in this branch. Not sure if it fits into the general idea of how your code is structured, but it works (for GFF files):
|
Thanks for the example code. Rather than add the The command line for that script is:
With the latest commits to make GFF parsing more permissive, this should work on the gencode files. (It takes about a minute on my machine.) I have not added the |
) Options --gff-tag and --gff-type allow filtering a complex GFF for just the relevant gene/exon annotations and help work around nonstandard tag usage. Option --refflat-type is equivalent to the --exon feature from refFlat2bed.py, and --flatten and --merge are copied directly from that script. A small but important bugfix in skgenome.merge. Some tweaks for code clarity.
I've added some features to
(Use |
So far I have been using a refFlat file as
--annotate
input in thetarget
command. However, now I am planning to start using GENODE, and the only available files are GTF and GFF3.Example: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz
According to the
target
documentation, one should be able to use GTF as input. However, it doesn't seem to work. For example, here is a simple bed file containing only one BRCA1 exon (with exact coordinates as given in the GTF file):But
target
does not annotate the interval:Note that it auto-detects GFF format even though the input is GTF. Output is the same when the GFF3 is used as input.
The text was updated successfully, but these errors were encountered: