8000 Update builtins read extensions by brisvag · Pull Request #7826 · napari/napari · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Update builtins read extensions #7826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 13, 2025
Merged

Conversation

brisvag
Copy link
Contributor
@brisvag brisvag commented Apr 15, 2025

References and relevant issues

Closes #3828.

Description

List of new extensions generated with:

import imageio
import tifffile
from pprint import pformat

imageio_exts = {f'*{ext}' for format in imageio.formats for ext in format.extensions}
imageio_video_exts = {f'*{fmt.extension}' for fmt in imageio.config.video_extensions if any(name == 'FFMPEG'for name in fmt.priority)}
tifffile_exts = {f'*.{ext}' for ext in tifffile.TIFF.FILE_EXTENSIONS}
napari_exts = {f'*.{ext}' for ext in ('npy', 'zarr', 'csv')}

exts = sorted(list(imageio_exts | imageio_video_exts | tifffile_exts | napari_exts))

json = pformat(exts, compact=True, indent=10)[:-1].replace('[', ' ')
print(json)

This spits out json formatted and ready to be copy pasted in the file:

          '*.3fr', '*.arw', '*.avi', '*.avs', '*.bay', '*.bif', '*.bmp',
          '*.bmq', '*.bsdf', '*.btf', '*.bufr', '*.bw', '*.cap', '*.cine',
          '*.cr2', '*.crw', '*.cs1', '*.csv', '*.ct', '*.cur', '*.cut', '*.dc2',
          '*.dcm', '*.dcr', '*.dcx', '*.dds', '*.dicom', '*.dng', '*.drf',
          '*.dsc', '*.ecw', '*.eer', '*.emf', '*.eps', '*.erf', '*.exr',
          '*.fff', '*.fit', '*.fits', '*.flc', '*.fli', '*.fpx', '*.ftc',
          '*.fts', '*.ftu', '*.fz', '*.g3', '*.gbr', '*.gdcm', '*.gel', '*.gif',
          '*.gipl', '*.grib', '*.h264', '*.h5', '*.hdf', '*.hdf5', '*.hdp',
          '*.hdr', '*.ia', '*.icns', '*.ico', '*.iff', '*.iim', '*.iiq', '*.im',
          '*.img', '*.img.gz', '*.ipl', '*.j2c', '*.j2k', '*.jfif', '*.jif',
          '*.jng', '*.jp2', '*.jpc', '*.jpe', '*.jpeg', '*.jpf', '*.jpg',
          '*.jpx', '*.jxr', '*.k25', '*.kc2', '*.kdc', '*.koa', '*.lbm',
          '*.lfp', '*.lfr', '*.lsm', '*.mdc', '*.mef', '*.mgh', '*.mha',
          '*.mhd', '*.mic', '*.mkv', '*.mnc', '*.mnc2', '*.mos', '*.mov',
          '*.mp4', '*.mpeg', '*.mpg', '*.mpo', '*.mri', '*.mrw', '*.msp',
          '*.ndpi', '*.nef', '*.nhdr', '*.nia', '*.nii', '*.nii.gz', '*.npy',
          '*.npz', '*.nrrd', '*.nrw', '*.ome.tif', '*.orf', '*.pbm', '*.pcd',
          '*.pcoraw', '*.pct', '*.pcx', '*.pef', '*.pfm', '*.pgm', '*.pic',
          '*.pict', '*.png', '*.ppm', '*.ps', '*.psd', '*.ptif', '*.ptiff',
          '*.ptx', '*.pxn', '*.pxr', '*.qpi', '*.qptiff', '*.qtk', '*.raf',
          '*.ras', '*.raw', '*.rdc', '*.rgb', '*.rgba', '*.rw2', '*.rwl',
          '*.rwz', '*.scn', '*.seq', '*.sgi', '*.sr2', '*.srf', '*.srw',
          '*.sti', '*.stk', '*.svs', '*.swf', '*.targa', '*.tf2', '*.tf8',
          '*.tga', '*.tif', '*.tiff', '*.vtk', '*.wap', '*.wbm', '*.wbmp',
          '*.wdp', '*.webm', '*.webp', '*.wmf', '*.wmv', '*.xbm', '*.xpm',
          '*.zarr', '*.zif'

@brisvag brisvag requested a review from a team as a code owner April 15, 2025 07:52
@brisvag
Copy link
Contributor Author
brisvag commented Apr 15, 2025

Do we have a centralized place to find all these additions? Or are you just remembering them? I should rpobably add them in the code snippet so we can remake this next time easily.

@brisvag
Copy link
Contributor Author
brisvag commented Apr 15, 2025

new code

import imageio
import tifffile
from pprint import pp

imageio_exts = {f'*{ext}' for format in imageio.formats for ext in format.extensions}
tifffile_exts = {f'*.{ext}' for ext in tifffile.TIFF.FILE_EXTENSIONS}
napari_exts = {f'*.{ext}' for ext in ('npy', 'zarr', 'csv')}

exts = imageio_exts | tifffile_exts | napari_exts

pp(sorted(list(exts)), compact=True)
['*.3fr', '*.arw', '*.avs', '*.bay', '*.bif', '*.bmp', '*.bmq', '*.bsdf',
 '*.btf', '*.bufr', '*.bw', '*.cap', '*.cine', '*.cr2', '*.crw', '*.cs1',
 '*.csv', '*.ct', '*.cur', '*.cut', '*.dc2', '*.dcm', '*.dcr', '*.dcx', '*.dds',
 '*.dicom', '*.dng', '*.drf', '*.dsc', '*.ecw', '*.eer', '*.emf', '*.eps',
 '*.erf', '*.exr', '*.fff', '*.fit', '*.fits', '*.flc', '*.fli', '*.fpx',
 '*.ftc', '*.fts', '*.ftu', '*.fz', '*.g3', '*.gbr', '*.gdcm', '*.gel', '*.gif',
 '*.gipl', '*.grib', '*.h5', '*.hdf', '*.hdf5', '*.hdp', '*.hdr', '*.ia',
 '*.icns', '*.ico', '*.iff', '*.iim', '*.iiq', '*.im', '*.img', '*.img.gz',
 '*.ipl', '*.j2c', '*.j2k', '*.jfif', '*.jif', '*.jng', '*.jp2', '*.jpc',
 '*.jpe', '*.jpeg', '*.jpf', '*.jpg', '*.jpx', '*.jxr', '*.k25', '*.kc2',
 '*.kdc', '*.koa', '*.lbm', '*.lfp', '*.lfr', '*.lsm', '*.mdc', '*.mef',
 '*.mgh', '*.mha', '*.mhd', '*.mic', '*.mnc', '*.mnc2', '*.mos', '*.mpo',
 '*.mri', '*.mrw', '*.msp', '*.ndpi', '*.nef', '*.nhdr', '*.nia', '*.nii',
 '*.nii.gz', '*.npy', '*.npz', '*.nrrd', '*.nrw', '*.ome.tif', '*.orf', '*.pbm',
 '*.pcd', '*.pcoraw', '*.pct', '*.pcx', '*.pef', '*.pfm', '*.pgm', '*.pic',
 '*.pict', '*.png', '*.ppm', '*.ps', '*.psd', '*.ptif', '*.ptiff', '*.ptx',
 '*.pxn', '*.pxr', '*.qpi', '*.qptiff', '*.qtk', '*.raf', '*.ras', '*.raw',
 '*.rdc', '*.rgb', '*.rgba', '*.rw2', '*.rwl', '*.rwz', '*.scn', '*.seq',
 '*.sgi', '*.sr2', '*.srf', '*.s
8000
rw', '*.sti', '*.stk', '*.svs', '*.swf',
 '*.targa', '*.tf2', '*.tf8', '*.tga', '*.tif', '*.tiff', '*.vtk', '*.wap',
 '*.wbm', '*.wbmp', '*.wdp', '*.webp', '*.wmf', '*.xbm', '*.xpm', '*.zarr',
 '*.zif']

@psobolewskiPhD
Copy link
Member
psobolewskiPhD commented Apr 15, 2025

Do we have a centralized place to find all these additions? Or are you just remembering them? I should rpobably add them in the code snippet so we can remake this next time easily.

I was tipped off by the failing tests and then looked at the diff.

Looking at the code:

def magic_imread(

splits out zarr
def imread(filename: str) -> np.ndarray:

considers npy
def read_csv(

reads csv

I don't see any other special cases?

Oh, wait:

if ext.lower() not in ['.tif', '.tiff', '.lsm']:

.lsm being shifted to tifffile. I think we should put all the tifffile ones here? Then again I think imageio uses tifffile, so that whole if might not be needed.

@brisvag
Copy link
Contributor Author
brisvag commented Apr 15, 2025

Thanks, yeah makes sense!

.lsm being shifted to tifffile. I think we should put all the tifffile ones here? Then again I think imageio uses tifffile, so that whole if might not be needed.

mmm, but here we are in code territory, no longer yaml manifest, so we can actually do this programmatically on the fly if needed. Either way, I'd say that's a separate PR!

@Czaki Czaki added this to the 0.6.0 milestone Apr 15, 2025
@Czaki Czaki added the maintenance PR with maintance changes, label Apr 15, 2025
@psobolewskiPhD
Copy link
Member

Yeah I think this code was all written a long time ago -- or ventured from scikit-image a long time ago -- so it may not use best practices for imageio v3. Probably worth making a followup issue.

One other thing -- this touches on our reader arch in general, but i just realized it (again):
with this change we're dropping the video extensions. See #3828 (comment)
But, imageio is pretty smart and it's enough to have one of the video plugins installed currently and they work -- e.g. if you have napari[docs]. They work really well actually! I actually use this on a semi regular bass, you can even save an animate gif.
Of course, if you don't have them installed you will get an error, but to make it worse, it's a ReaderError:

ReaderPluginError: Tried to read /Users/sobolp/Downloads/view_ray.mp4 with plugin napari, because it was associated with that file extension/because it is the only plugin capable of reading that path, but it gave an error. Try associating a different plugin or installing a different plugin for this kind of file.

Which is nigh useless. Meanwhile, imageio gives a nice error:

OSError: Could not find a backend to open `/Users/sobolp/Downloads/view_ray.mp4`` with iomode `r`.
Based on the extension, the following plugins might add capable backends:
  FFMPEG:  pip install imageio[ffmpeg]
  pyav:  pip install imageio[pyav]

But it's swallowed by our reader.

Anyhow, with this change, which is strictly correct because we don't depend on the video plugins to imageio, now those files just won't work and there is no easy way for someone to get them to work. I quickly searched the napari-hub and didn't find anything obvious.
I think this reveals a weakness in the explicit list of extensions 😭
cc: @DragaDoncila

@psobolewskiPhD
Copy link
Member
psobolewskiPhD commented Apr 15, 2025

Also, i tested this with a .ndpi, ome.tiff, .qptiff, and .svs (sample images used QuPath, but also common formats used at JAX because there are used by slide scanners) and none of them worked -- they all need imagecodecs.
Again the error is just reader error, but I tested by trying to read them directly using imread, which gives the more informative error.

I think adding the extensions is still a plus, because again otherwise there isn't any easy way for someone to get these files to work without changing the extensions, but I think we need to try to improve error handling and ensure the user knows what to do when they try to load an .svs or ndpi.

@brisvag
Copy link
Contributor Author
brisvag commented Apr 16, 2025

Ouch, yeah, looks like we need to surface those more informative errors all the way to the user. I guess the only real solution (other than moving away from explicit extensions) would be to generate this yaml periodically with a workflow which gets the extensions from the various libraries?

@DragaDoncila
Copy link
Contributor

Just to chime in here on that annoying error message (which I also hate lol).

We added that when updating the reading behaviour to stop cascading the different reader plugins and move to the reader dialog. The tricky bit it was supposed to address is when: the user has a plugin associated with the extension and that plugin fails and/or that plugin is the only one that declares it can read the file. Then the user was supposed to get this error message that explains why the plugin was tried in the first place.

It's annoying how confused the stack trace is though. When we raise the error we raise it from the original error which does preserve the original stack trace, but I find the printout is not very useful at all, so maybe we can improve how we print out the error message in this case.

@DragaDoncila
Copy link
Contributor

I guess the only real solution (other than moving away from explicit extensions) would be to generate this yaml periodically with a workflow which gets the extensions from the various libraries?

I don't think our cursed reader is a good reason to move away from explicit extensions at all lol. The vast majority of plugins will only declare a handful of extensions and for those that don't, it's up to them to make sure that list is up to date 🤷‍♀️. I think a workflow that regenerates the list periodically is a good idea.

@psobolewskiPhD
Copy link
Member

@DragaDoncila How do you think we should handle, for example, the video extensions or the tiff variants then?
If we include them in the current list, it's a bit of a scam because they won't work out of the box. But, it's easy enough for someone to have an env with the needed pieces and then everything works. If we could raise the imageio/tifffile errors to the user, then I think this is pretty OK?
If we exclude them, which is strictly the "correct" thing to do, they will never work, barring someone making a new reader plugin, publicizing it, the user installing it, so on. This feels bad, but we don't want to promise things we can't deliver either.

@DragaDoncila
Copy link
Contributor

But, it's easy enough for someone to have an env with the needed pieces and then everything works. If we could raise the imageio/tifffile errors to the user, then I think this is pretty OK?

Yeah this is what I think we should do. The original raise from implementation was already supposed to do this, it just does it poorly, so we should just improve that imo

@brisvag
Copy link
Contributor Author
brisvag commented Apr 17, 2025

So, for this PR, I guess we need to add these extensions (@psobolewskiPhD do you know where I can find them programmatically, like I did with the rest?)

And then we follow up with:

  1. workflow to autoupdate the extensions
  2. better error message
  3. maybe do the same for writers?

@psobolewskiPhD
Copy link
Member
psobolewskiPhD commented Apr 20, 2025

@brisvag
The video formats are provided here:
https://imageio.readthedocs.io/en/stable/formats/video_formats.html
This file is autogenerated, but I can't figure out how exactly.
PyAV is the suggested replacement for the old ffmpeg plugin, but it obviously handles way more extensions.

Edit: ok i found it, it's not well documented in the docs, but I found a sphinx extension imageio_ext that has the commands we need:
https://github.com/imageio/imageio/blob/3e2a165e484135d4d627c8f1fed5cb3bf13438c6/docs/imageio_ext.py

That one shows:
from imageio.config import known_plugins, video_extensions

So we can use:

from imageio.config import video_extensions, known_plugins

# Get all video extensions and their supporting plugins
ffmpeg_formats = {}
pyav_formats = {}

for fmt in video_extensions:
    for plugin_name in fmt.priority:
        if plugin_name == "FFMPEG":
            ffmpeg_formats[fmt.extension] = fmt.name
        elif plugin_name == "pyav":
            pyav_formats[fmt.extension] = fmt.name

You don't need to have the extensions installed for this to work.
I think that listing all 112 of the pyav ones would be overkill?
So maybe just the ffmpeg ones for now, even though that will be replaced? that's just 9:

In [5]: ffmpeg_formats
Out[5]: 
{'.avi': 'Audio Video Interleave',
 '.h264': 'raw H.264 video',
 '.mkv': 'Matroska Multimedia Container',
 '.mov': 'QuickTime File Format',
 '.mp4': 'MPEG-4 Part 14',
 '.mpeg': 'MPEG-1 Moving Picture Experts Group',
 '.mpg': 'Moving Picture Experts Group',
 '.webm': 'Matroska',
 '.wmv': 'Windows Media Video'}

And it appears to match what we currently include, which would explain where they came from.

< 8000 !-- -->

@brisvag
Copy link
Contributor Author
brisvag commented Apr 22, 2025

Awesome, thanks for tracking that down @psobolewskiPhD! I updated everything, should be good to go now.

Copy link
codecov bot commented Apr 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.94%. Comparing base (6cac059) to head (cc5a32f).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7826      +/-   ##
==========================================
+ Coverage   92.92%   92.94%   +0.02%     
==========================================
  Files         641      641              
  Lines       60627    60600      -27     
==========================================
- Hits        56335    56323      -12     
+ Misses       4292     4277      -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@psobolewskiPhD
Copy link
Member

Looks good to me. I tested with an svs and mp4 I had handy and both failed, but worked once i installed av and imagecodecs.
So we just need to improve the error message handling in reader fails.

@psobolewskiPhD psobolewskiPhD added the ready to merge Last chance for comments! Will be merged in ~24h label Apr 22, 2025
@psobolewskiPhD
Copy link
Member

I created a followup issue regarding the error messages:
#7850

Also, Im bumping this issue:
#2686 (comment)
I think the idea is sound and we already have a UI for it, we just need to not white list only already claimed extensions!!

@psobolewskiPhD psobolewskiPhD removed the ready to merge Last chance for comments! Will be merged in ~24h label Apr 23, 2025
Copy link
Member
@psobolewskiPhD psobolewskiPhD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have spoken too soon.
video files do work, but I'm not sure that tiff variants, like the issue are actually solved.
I have more test files on my work laptop, I will try to test again tomorrow.

@jni jni modified the milestones: 0.6.0, 0.6.1 Apr 23, 2025
@jni
Copy link
Member
jni commented Apr 23, 2025

I moved the milestone while you are testing. If this becomes mergeable before the release we can move it back.

@psobolewskiPhD
Copy link
Member
psobolewskiPhD commented Apr 23, 2025

OK, I think the reason I can't open the svs I have on my personal mac -- meaning this doesn't fix the issue-- is because of this check:

if all(str(x).lower().endswith(tuple(READER_EXTENSIONS)) for x in path):

The tifffile extensions are not in imageio.formats, so they arn't in READER_EXTENSIONS.

We need READER_EXTENSIONS to match the extensions that are being generated here, so that they are actually read.

So we can merge this as is, but we should ensure to not close the issue.
To close the issue, we either need to do something like:

IMAGEIO_EXTENSIONS = {x for f in formats for x in f.extensions}
TIFFFILE_EXTENSIONS = {f'.{ext}' for ext in tifffile.TIFF.FILE_EXTENSIONS}
READER_EXTENSIONS = IMAGEIO_EXTENSIONS.union({'.zarr', '.npy', *TIFFFILE_EXTENSIONS})

Or another option is to use:
imageio.config.known_extensions.keys() for both the yml and the code.
It looks like it gets everything that imageio handles, so maybe that's the best way actually in a future where we do #7850
Accept every extension that imageio accepts. If the user has the imageio plugin, boom it's read. Else, they get an informative error? (imageio does pass up tifffile errors)
Upside is, we dont need the if ext.lower() not in ['.tif', '.tiff', '.lsm']: logic, since imageio.v3 can handle all the tifffile extensions using tifffile.
So iio.imread() should suffice.

@brisvag
Copy link
Contributor Author
brisvag commented Apr 23, 2025

Or another option is to use:
imageio.config.known_extensions.keys() for both the yml and the code.

I like this option the most, assuming we have the better error surfaced to the user.

I would vote for merging this (as it's strictly better than main, I think?), but follow up with the above PRs for error messages and to use known_extensions in both places.

@psobolewskiPhD
Copy link
Member

Eh, is it better? It's in the eye of the beholder I think:
On main for my .svs napari tells me no plugin found which is pretty clear--and correct.
With this branch instead I get the tried to read with plugin... error, which is a bit more cryptic?

I do agree that we need 3 fixes:

  • fix the builtins reader so that the gatekeeping matches the extensions
  • fix the error, so users get the error from the reading library and not just napari error -- this benefits other readers too!
  • fix the builtins extensions white list (this PR)

I think they are all interdependent, but the 2nd is pretty much its own independent thing, it affects all readers. Meanwhile, the first will be a silent change without the 3rd.

@brisvag
Copy link
Contributor Author
brisvag commented Apr 28, 2025

I'd like to try and fix the error as suggested, but this part of the codebase is a bit out of my comfort zone... @DragaDoncila you think you could point me in the right direction?

@DragaDoncila
Copy link
Contributor

@brisvag sorry I missed this! I'll draft up a PR tomorrow and we can review together when we pair, that way you also get more familiarity with that part of code

@brisvag
Copy link
Contributor Author
brisvag commented May 9, 2025

After pairing with @DragaDoncila, we concluded that the READER_EXTENSIONS logic we had inside our reader function was obsolete. This is no longer needed since we filter extensions from our plugin manifest.

Additionally, we are able to modify the check for tifffile extensions to match the one used here to generate the extension list, this way that will never be out of date. I think we this we solved all the standing issues and this can be merged! @psobolewskiPhD wanna check again?

@psobolewskiPhD
Copy link
Member

@brisvag Oh that makes sense. Just handle the extensions on the yaml side!
In that case, I think we can go a step further, because imageio has a tifffile plugin. So we can just use iio.imread. And it handles URI/URL so we can drop the extra context manager. I'll make a PR to your branch.

Copy link
Member
@psobolewskiPhD psobolewskiPhD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve, but think it's worth considering just going full iio.imread.
I made a PR to this branch.

@brisvag
Copy link
Contributor Author
brisvag commented May 12, 2025

Great simplification @psobolewskiPhD, merged it! Now we just give everything to imread except for npy files and everything Just Works ™️ . Should be good to go once tests are gone :)

@brisvag brisvag added the ready to merge Last chance for comments! Will be merged in ~24h label May 12, 2025
@psobolewskiPhD
Copy link
Member

In theory we could split off npy at the very beginning like csv? But I guess npy would be an image, so it's probably fine the way it is too.

@brisvag
Copy link
Contributor Author
brisvag commented May 12, 2025

I can see it either way. I'd say for now let's leave it as is since it's not really changing behaviour.

@TimMonko TimMonko merged commit 92ed355 into napari:main May 13, 2025
41 checks passed
@github-actions github-actions bot removed the ready to merge Last chance for comments! Will be merged in ~24h label May 13, 2025
TimMonko pushed a commit that referenced this pull request May 14, 2025
)

# References and relevant issues
Discussed in #7826

# Description

When a plugin is selected to read a file, but it fails OR when there is
only a single plugin available to read a file, we raise our own error
message `from` the original error. However, this means the (likely more
useful) original error thrown by the plugin doesn't appear unless you
scroll up in the stack trace, even if the error would be relatively easy
to fix e.g. just install `imagecodecs` or something.

This PR improves our own error message so that the original error is
printed to the user, and they are advised to scroll up to see the full
stack trace. @brisvag and I discussed maybe bringing the actual stack
trace down, but decided that would be a lot more complex, and
potentially confusing for developers who **are** used to just scrolling
up to get to their own code.

Prior to this PR, if you don't have `tifffile` installed (for some
reason) and try to open a tiff file:


![image](https://github.com/user-attachments/assets/a8c06009-5e48-43ad-b6f3-7b0147db70ff)

With this PR:


![image](https://github.com/user-attachments/assets/7ac2d792-48da-49e3-bd48-523758df8fcb)

---------

Co-authored-by: Lorenzo Gaifas <brisvag@gmail.com>
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
Co-authored-by: Draga Doncila <ddon0001@student.monash.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance PR with maintance changes,
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opening QPTIFF
6 participants
0