Description
Hi, thank you for this wonderful tool.
I've been using fastANI for a long period now and I've noticed a bug which I find hard to explain or demonstrate, but I will do my best :) :
When I use fastANI on multi-threading with long reference and query list, seems like it skips some reference genomes (they do not appear in the result file at all).
For example, when I use one genome vs. all my reference database (~117K genomes) with -t 1 I get back 2946 hits with ANI >= 95 (same species).
However, when I take multiple genomes (~3000) to compare with my reference, the same genome from the previous example gets only 2780 hits with ANI >=95 and I couldn't find the remaining 166 anywhere in the results.
To validate that they indeed have an ANI value, I ran the same genome again with the 166 missing hits (-t 1) and I got back the appropriate ANI results (~98 ANI).
In addition, I was trying to split my reference dataset into files with 5K genomes, but the problem remains.
I will be glad to provide more information if needed, thanks!