-
Notifications
You must be signed in to change notification settings - Fork 566
Indentation-aware parser error recovery #5000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3fd3914
to
7c238cf
Compare
7c238cf
to
aeb3b6e
Compare
aeb3b6e
to
718d66a
Compare
959aeaa
to
e66d1be
Compare
e66d1be
to
298514c
Compare
084de07
to
5fd6e01
Compare
@@ -0,0 +1,12 @@ | |||
# typed: true | |||
|
|||
# TODO(jez) Fix this test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be fixed in a subsequent change. I'll make a ticket for this eventually.
end | ||
Integer.class | ||
end # error: Hint: closing "end" token was not indented as far as "if" token | ||
end # error: unexpected token "end of file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to address the TODO
's in this file in a followup PR.
5f80edd
to
439ed50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty reasonable, but there are enough comments/things to discuss that I think it's worth going through another round.
parser/Parser.cc
Outdated
core::Loc loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff)); | ||
return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core::Loc loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff)); | |
return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff)); | |
return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff)); |
class A | ||
def test1 | ||
if x.f | ||
end | ||
|
||
def test2 | ||
if x.f | ||
end | ||
end | ||
end # error: unexpected token "end of file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that something like this should be addressable within the framework we have here. Is that not the case, or does this test need a TODO?
My question is motivated by my surprise that this test and def_missing_end_2.rb
produce nothing in the class definition. But perhaps that's because the exp files for them are desugar tests, and they should be some kind of parse-tree exp tests?
(I guess Ruby technically allows defs within defs, but I think that's kind of busted, and perhaps --stripe-mode
could disallow that...?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that something like this should be addressable within the framework we have here. Is that not the case, or does this test need a TODO?
Yep! I just wanted to break up the change. I will be fixing this in future changes.
My question is motivated by my surprise that this test and
def_missing_end_2.rb
produce nothing in the class definition. But perhaps that's because the exp files for them are desugar tests, and they should be some kind of parse-tree exp tests?
I've been using desugar-tree tests simply because they're easier to skim to verify correctness. None of the parser formats are as easy to skim. The empty class definition you see is for <root>
not A
, which desugar wraps around all parse results, even empty ones.
website/docs/error-reference.md
Outdated
This Ruby snippet does not parse, but the reason why is confusing. Sorbet (and | ||
the Ruby VM) attempt to parse this file as if it were indented like this: | ||
|
||
```ruby | ||
class A | ||
def foo | ||
if x | ||
end | ||
end | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what this is attempting to communicate, but I think phrasing it as "as if it were indented like this" carries the unintended implication that Ruby (ala Python) cares about indentation, which is not the case. Maybe something like:
This Ruby snippet does not parse, but the reason why is confusing. Sorbet (and the Ruby VM) associate the first
end
with theif
instead of the intendeddef
. We can change the indentation to make Sorbet's view of the file clearer: ...
: level_(lvl), type_(type), location_(token->start(), token->end()), data_(data), | ||
extra_location_(extra_token != nullptr | ||
? std::make_optional<range>(range(extra_token->start(), extra_token->end())) | ||
: std::nullopt) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: level_(lvl), type_(type), location_(token->start(), token->end()), data_(data), | |
extra_location_(extra_token != nullptr | |
? std::make_optional<range>(range(extra_token->start(), extra_token->end())) | |
: std::nullopt) {} | |
: diagnostic(lvl, type, range_from_token(token), data, extra_token != nullptr | |
? std::make_optional<range>(range_from_token(extra_token)) | |
: std::nullopt) {} |
This needs formatting, and a range_from_token
helper somewhere, but ideally the intent should be clear.
parser/parser/cc/lexer.rl
Outdated
if (leftIsSpace && !rightIsSpace) { | ||
return -1; // left < right | ||
} else if (!leftIsSpace && !rightIsSpace) { | ||
return 0; // left == right | ||
} else if (!leftIsSpace && rightIsSpace) { | ||
return 1; // left > right | ||
} else if (leftChar == rightChar) { | ||
leftPtr++; | ||
rightPtr++; | ||
} else { | ||
// mismatched indent. give up and say equal | ||
// TODO(jez) Might want to handle this case better | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid the else
after if
by rewriting:
if (leftIsSpace && !rightIsSpace) { | |
return -1; // left < right | |
} else if (!leftIsSpace && !rightIsSpace) { | |
return 0; // left == right | |
} else if (!leftIsSpace && rightIsSpace) { | |
return 1; // left > right | |
} else if (leftChar == rightChar) { | |
leftPtr++; | |
rightPtr++; | |
} else { | |
// mismatched indent. give up and say equal | |
// TODO(jez) Might want to handle this case better | |
return 0; | |
} | |
if (leftIsSpace && !rightIsSpace) { | |
return -1; // left < right | |
} | |
if (!leftIsSpace && !rightIsSpace) { | |
return 0; // left == right | |
} | |
if (!leftIsSpace && rightIsSpace) { | |
return 1; // left > right | |
} | |
if (leftChar != rightChar) { | |
// mismatched indent. give up and say equal | |
// TODO(jez) Might want to handle this case better | |
return 0; | |
} | |
leftPtr++; | |
rightPtr++; |
// When recovering from errors, sometimes we'd like to force a production rule to become an | ||
// error if indentation didn't match in an attempt to both show an error near where the error | ||
// belongs as well as so a tokens that would be consumed eagerly are left untouched for later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what goes after "as well as" here, but I don't think it's what currently comes after it. 😅
parser/Parser.cc
Outdated
// Always report the original parse errors | ||
errorToError(gs, file, driver->diagnostics); | ||
|
||
if (ast != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this path uses errorToError
, but the indentation-aware path uses reportDiagnostics
, perhaps we should just inline errorToError
's code here to make the symmetry between the cases more explicit? I think that would also have the benefit of making the onlyHints
logic more explicit, so that the reader doesn't get concerns about errors being emitted twice, only to discover that onlyHints
would prevent that. (Still might be good to expand on the rationale for always reporting errors?)
parser/Parser.cc
Outdated
ENFORCE(diag.extra_location().has_value()); | ||
e.addErrorLine(rangeToLoc(gs, file, diag.extra_location().value()), "Matching token was here"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: we don't have any tests that check we point at the matching token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll add one
4d54227
to
44e274c
Compare
We're going to be making multiple drivers. It'll help to have a helper function for that.
This will let us compute the indentation of the line a token is on. We might want to populate the indentation of the token on construction, but for simplicitly right now I'm just going to ask for the indentation at the points where I ask for it. This might not be the most efficent path forwards, but it might be the sort of thing that we can just elide/not populate except once we know there's a syntax error, and we can afford to be a little slower than usual. Also, this bloats the size of a `token` by one `size_t`.
... so that we can write a section for them in the error docs.
This change looks better ignoring whitespace.
This way we can make the messaging around the hint errors more clear ("they're just hints, might be wrong")
Here's hoping that this always takes so little time that it never shows up in the log.
9183020
to
52294cf
Compare
Motivation
I recommend reviewing by commit.
Test plan
See included automated tests.