8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements shell-style globs in import statements using the Rust glob crate. Addresses #1641
glob
Sorry, something went wrong.
55a246a
94fd369
cd7c5a5
dedf056
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
See comments. The main one is that we have to handle non-unicode paths, so we should look at a different glob crate.
just is expected to work with non-unicode paths, so we can't do a lossy conversation here. I'd suggest checking out the globset crate, which I believe handles non-unicode paths.
just
globset
globset unfortunately doesn't have a method to iterate over a file system entry and select only paths that match the glob, like the iterator that glob_with returns with the glob crate. Looking at how glob_with is implemented in glob, it seems like this code is a bit tricky to write and I'd rather avoid it (or if I did write that code, I think it might work out to just implementing path-globbing functionality from scratch, which is also undesirable).
glob_with
There's also this crate: https://crates.io/crates/wax , which is utf-8 only but claims to reasonably handle sane non-utf-8 text internally, and also seems to have the file system iterator I want. It also hasn't been updated in a bit and claims to be experimental, so I'm kind of iffy about this crate.
Maybe I'm missing something obvious that would make this easier to do.
Ahh, that's lame. A concern about wax, any syntax that wax supports becomes part of the public-facing interface of just, and so can't be added and removed without a backwards-incompatible change. wax supports some pretty exotic syntax, like repetitions <a*/:0,> and alternatives {a?c,x?z,foo}, which I doubt would be incredibly useful for use in import paths.
wax
<a*/:0,>
{a?c,x?z,foo}
Actually it looks like even globset expects to be given the glob as a string: https://docs.rs/globset/latest/globset/struct.GlobBuilder.html#method.new . Also see this 2019 comment from the maintainer of globset on glob: rust-lang/glob#78 (comment) and this globset issue: BurntSushi/ripgrep#1250 .
I'm starting to think that there's no principled way to support globbing on non-utf8 filesystems without basically submitting a patch to some upstream glob project to do it, but maybe few enough people are using these tools with non-utf8 globs that no one has previously run into it?
The to_string_lossy function turns any non-utf8 sequence into the unicode replacement character �. I wonder if there is some other function that could be called to try to convert a PathBuf into a meaningful Rust utf-8 string (maybe doing something smart with guessing the original encoding). The quick solution would be to move this conversion into the has_glob block, which would mean that people using glob imports and file paths with non-utf8 names would run into issues but at least no existing just behavior would break.
to_string_lossy
PathBuf
has_glob
It's a bit lame, but we could just throw an error if you try to glob with a path which can't be converted to unicode. Non-unicode paths are very rare and basically just crazy, so not supporting them is probably fine, especially if they're only unsupported if you use a pretty niche feature.
I'm tempted to land this feature as unstable initially, both because of the unicode path issue, but also because there's some choice about which glob syntaxes we support, and it might be good to make compare them all and make a principled choice about what to support.
I think a glob matching a directory should be an error. Just because we can backwards-compatibly make it not an error, but not backwards-compatibly turn it into an error.
bce3e52
ee26676
WIP globbing
80e65a4
casey#1641
glob crate
79f1e53
Work
715ddd1
remove unwrap
9197d45
7c06b60
WIP globset work
1d346c9
globwalk crate
a7ce5b1
try globwalk
0531bb5
0c03054
3e88a69
More work
329c7fc
UTF-8
c0b4023
Alright, I gated glob imports behind unstable, went back to using the glob crate, and explicitly error out if conversion to a utf-8 native rust string doesn't work.
directory error case
440ee92
Did some initial review, see comments.
(No real reason, I just like avoiding macros.)
I would actually just keep this called absolute. It's a vec of paths, so we know that they're paths and there's potentially more than one of them.
absolute
My argument for changing it is that "absolute" is a pretty general word out of context, so it's hard to guess what it might mean when you see it as a random property access somewhere in code. I don't feel super strongly about it however.
This is tricky because glob supports more meta-characters than *. It supports ?, *, [, !, and ]. I suppose we should check for all of them.
*
?
[
!
]
Can we assume that none of those characters will show up in a legitimate justfile? I think that's safe to assume for * but maybe not for characters like [ and ].
What if there was a slightly different version of the import keyword that signaled that shell globs might be used, or an argument to it? import glob "my/path/*.just" or something like that?
import
import glob "my/path/*.just"
I believe this should be true. If false, then / can be matched by *, which is the bash default and zsh default. ** can be used to match an arbitrary directory path, including *.
true
false
/
bash
zsh
**
We can do to_str on the path without first converting to an OsStr.
to_str
OsStr
This swallows I/O errors. If we encounter an I/O error we should fail.
Is this circular import test duplicated with the one above?
It is, but I think that trying to de-duplicate those lines of code would add more complexity than it's worth, since the logic here is just constructing the CircularImport error if necessary and then manipulating some local variables.
CircularImport
some PR feedback
12a06c8
Addl work
fdbc562
Glob err
c971456
More err
227586e
Merge branch 'master' into import-glob
de881a4
e0e2bee
ec7c1d2
I keep coming back to this. I have these worries:
I worry about delegating glob parsing to a dependency we don't control, and thus may not understand the syntax.
I worry about the backwards compatibility impact on existing justfiles. Since although it is crazy to use */?/** in a path name, it does happen, and maybe especially on windows or GUI systems. This might also be extra confusing because the globbing character wouldn't appear in the path literal in the justfile.
*/?/**
justfile
I worry about the restriction that it doesn't work with non-unicode paths.
I wonder if maybe it would be better if there was some syntax to enable the glob at the import, and it was that syntax which was experimental, instead of a setting which turns it on for all imports
Something like:
glob import '*/mod.just'
looking forward to see something like this merged.
@neunenak What are you thinking about this? I think we should do this, and I think the combination of both making it unstable, and opting into explicitly for each import (syntax tbd, but glob import "foo" or import glob "foo" both seem fine), make me much more comfortable about dealing with the shortcomings later.
glob import "foo"
import glob "foo"
@neunenak Still working on this?
casey Awaiting requested review from casey
Successfully merging this pull request may close these issues.