8000 [Dev branch] Issues with autometa/common/external/bedtools.py · Issue #18 · KwanLab/Autometa · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Dev branch] Issues with autometa/common/external/bedtools.py #18
Closed
@jason-c-kwan

Description

@jason-c-kwan

This is a script which contains wrapper functions for running https://bedtools.readthedocs.io/en/latest/ and parsing the results.

Issues:

  1. The genomecov function doesn't return anything, so this should be stated in the docstring.

  2. If the out path already exists, then the genomecov function raises a FileExistsError. However, it would be good to perhaps have a force flag so that it will go ahead and overwrite? Otherwise if something goes wrong in the middle of this step, it is difficult to re-run without manually deleting the file. An alternative would be to add code that deletes the output file in the event of retcode coming back as not equal to zero.

  3. As a first step in the parse function, it checks whether the out path exists, and if so it will parse as a Pandas dataframe and return that. However, there is no code to check whether the table is valid. Perhaps the error-handling within Pandas is enough?

  4. The parse function loads the .bed file as a Pandas dataframe with the following column names: contig, depth, bases, length, breadth. However, according to the [https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html](the Bedtools docs), the last column is the fraction of bases on the chromosome (contig in this case) with depth equal to column 2, so the name breadth is perhaps not very descriptive.

names = ['contig','depth','bases','length','breadth']
df = pd.read_csv(bed, sep='\t', names=names, index_col='contig')

  1. Likewise I am not sure the name total_breadth is appropriate for this variable:

df = df.assign(total_breadth=lambda x: x.depth * x.bases)
dff = df.groupby('contig')['total_breadth', 'bases'].sum()
dff = dff.assign(coverage=lambda x: x.total_breadth/x.bases)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0