image_dataset_from_directory uses wrong directory when labels is list

Describe the problem.

The docs for image_dataset_from_directory say the following about the directory argument:

Directory where the data is located. If labels is "inferred", it should contain subdirectories, 
each containing images for a class. Otherwise, the directory structure is ignored.

This means that when labels is a list/tuple, we should ignore the directory structure (this makes sense, as the directory structure would only be used to generate labels).

Describe the current behavior.

However, this is not what happens - instead, see the following code snippet from dataset_utils.py:

  if labels is None:
    # in the no-label case, index from the parent directory down.
    subdirs = ['']
    class_names = subdirs
  else:
    subdirs = []
    for subdir in sorted(tf.io.gfile.listdir(directory)):

We only ignore the subdirectory structure if labels is None, instead of when labels != 'inferred'. This means that when labels is a list/tuple, we expect a subdirectory structure (when none exists), causing image_dataset_from_directory to fail in this case.

Describe the expected behavior.

We should ignore the subdirectory structure if labels is anything other than inferred (i.e. make the code match what the documentation says should happen). This should be a one-line change, and I'd be happy to make a PR.

However, the existence of this issue suggests the use case where labels is a list/tuple is not unit tested, so it would probably be good to write a test. Would love a suggestion from someone more familiar with the codebase about how best to do this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions