Description
This is a script which contains wrapper functions for running https://bedtools.readthedocs.io/en/latest/ and parsing the results.
Issues:
-
The
genomecov
function doesn't return anything, so this should be stated in the docstring. -
If the
out
path already exists, then thegenomecov
function raises a FileExistsError. However, it would be good to perhaps have aforce
flag so that it will go ahead and overwrite? Otherwise if something goes wrong in the middle of this step, it is difficult to re-run without manually deleting the file. An alternative would be to add code that deletes the output file in the event ofretcode
coming back as not equal to zero. -
As a first step in the
parse
function, it checks whether theout
path exists, and if so it will parse as a Pandas dataframe and return that. However, there is no code to check whether the table is valid. Perhaps the error-handling within Pandas is enough? -
The
parse
function loads the.bed
file as a Pandas dataframe with the following column names:contig
,depth
,bases
,length
,breadth
. However, according to the [https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html](the Bedtools docs), the last column is the fraction of bases on the chromosome (contig in this case) with depth equal to column 2, so the namebreadth
is perhaps not very descriptive.
Autometa/autometa/common/external/bedtools.py
Lines 102 to 103 in 01a432d
- Likewise I am not sure the name
total_breadth
is appropriate for this variable:
Autometa/autometa/common/external/bedtools.py
Lines 107 to 109 in 01a432d