Description
Describe the problem.
The docs for image_dataset_from_directory say the following about the directory
argument:
Directory where the data is located. If labels is "inferred", it should contain subdirectories,
each containing images for a class. Otherwise, the directory structure is ignored.
This means that when labels
is a list/tuple, we should ignore the directory structure (this makes sense, as the directory structure would only be used to generate labels).
Describe the current behavior.
However, this is not what happens - instead, see the following code snippet from dataset_utils.py
:
if labels is None:
# in the no-label case, index from the parent directory down.
subdirs = ['']
class_names = subdirs
else:
subdirs = []
for subdir in sorted(tf.io.gfile.listdir(directory)):
We only ignore the subdirectory structure if labels is None
, instead of when labels != 'inferred'
. This means that when labels
is a list/tuple, we expect a subdirectory structure (when none exists), causing image_dataset_from_directory
to fail in this case.
Describe the expected behavior.
We should ignore the subdirectory structure if labels
is anything other than inferred
(i.e. make the code match what the documentation says should happen). This should be a one-line change, and I'd be happy to make a PR.
However, the existence of this issue suggests the use case where labels
is a list/tuple is not unit tested, so it would probably be good to write a test. Would love a suggestion from someone more familiar with the codebase about how best to do this.