8000 Indentation-aware parser error recovery by jez · Pull Request #5000 · sorbet/sorbet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Indentation-aware parser error recovery #5000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Feb 10, 2022
Merged

Indentation-aware parser error recovery #5000

merged 30 commits into from
Feb 10, 2022

Conversation

jez
Copy link
Collaborator
@jez jez commented Dec 10, 2021

Motivation

I recommend reviewing by commit.

Test plan

See included automated tests.

@jez jez force-pushed the jez-better-parser branch 2 times, most recently from 3fd3914 to 7c238cf Compare January 11, 2022 00:33
10000 @jez jez force-pushed the jez-better-parser branch from 7c238cf to aeb3b6e Compare February 4, 2022 23:02
@jez jez changed the title wip: Make parser incremental Make parser indentation-aware Feb 4, 2022
@jez jez self-assigned this Feb 4, 2022
@jez jez changed the title Make parser indentation-aware Indentation-aware parser error recovery Feb 4, 2022
@jez jez force-pushed the jez-better-parser branch from aeb3b6e to 718d66a Compare February 5, 2022 00:27
@jez jez force-pushed the jez-better-parser branch from 959aeaa to e66d1be Compare February 5, 2022 02:26
@jez jez changed the base branch from master to jez-checkpoint February 5, 2022 02:26
@jez jez force-pushed the jez-better-parser branch from e66d1be to 298514c Compare February 5, 2022 02:31
Base automatically changed from jez-checkpoint to master February 5, 2022 03:38
@jez jez force-pushed the jez-better-parser branch 2 times, most recently from 084de07 to 5fd6e01 Compare February 5, 2022 07:26
@jez jez marked this pull request as ready for review February 7, 2022 21:30
@jez jez requested a review from a team as a code owner February 7, 2022 21:30
@jez jez requested review from froydnj and removed request for a team February 7, 2022 21:30
@@ -0,0 +1,12 @@
# typed: true

# TODO(jez) Fix this test
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be fixed in a subsequent change. I'll make a ticket for this eventually.

end
Integer.class
end # error: Hint: closing "end" token was not indented as far as "if" token
end # error: unexpected token "end of file"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to address the TODO's in this file in a followup PR.

@jez jez force-pushed the jez-better-parser branch from 5f80edd to 439ed50 Compare February 8, 2022 00:31
Copy link
Contributor
@froydnj froydnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty reasonable, but there are enough comments/things to discuss that I think it's worth going through another round.

parser/Parser.cc Outdated
core::Loc loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));
return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
core::Loc loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));
return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));
return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));

Comment on lines +3 to +12
class A
def test1
if x.f
end

def test2
if x.f
end
end
end # error: unexpected token "end of file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that something like this should be addressable within the framework we have here. Is that not the case, or does this test need a TODO?

My question is motivated by my surprise that this test and def_missing_end_2.rb produce nothing in the class definition. But perhaps that's because the exp files for them are desugar tests, and they should be some kind of parse-tree exp tests?

(I guess Ruby technically allows defs within defs, but I think that's kind of busted, and perhaps --stripe-mode could disallow that...?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that something like this should be addressable within the framework we have here. Is that not the case, or does this test need a TODO?

Yep! I just wanted to break up the change. I will be fixing this in future changes.

My question is motivated by my surprise that this test and def_missing_end_2.rb produce nothing in the class definition. But perhaps that's because the exp files for them are desugar tests, and they should be some kind of parse-tree exp tests?

I've been using desugar-tree tests simply because they're easier to skim to verify correctness. None of the parser formats are as easy to skim. The empty class definition you see is for <root> not A, which desugar wraps around all parse results, even empty ones.

Comment on lines 70 to 79
This Ruby snippet does not parse, but the reason why is confusing. Sorbet (and
the Ruby VM) attempt to parse this file as if it were indented like this:

```ruby
class A
def foo
if x
end
end
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what this is attempting to communicate, but I think phrasing it as "as if it were indented like this" carries the unintended implication that Ruby (ala Python) cares about indentation, which is not the case. Maybe something like:

This Ruby snippet does not parse, but the reason why is confusing. Sorbet (and the Ruby VM) associate the first end with the if instead of the intended def. We can change the indentation to make Sorbet's view of the file clearer: ...

Comment on lines 44 to 47
: level_(lvl), type_(type), location_(token->start(), token->end()), data_(data),
extra_location_(extra_token != nullptr
? std::make_optional<range>(range(extra_token->start(), extra_token->end()))
: std::nullopt) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: level_(lvl), type_(type), location_(token->start(), token->end()), data_(data),
extra_location_(extra_token != nullptr
? std::make_optional<range>(range(extra_token->start(), extra_token->end()))
: std::nullopt) {}
: diagnostic(lvl, type, range_from_token(token), data, extra_token != nullptr
? std::make_optional<range>(range_from_token(extra_token))
: std::nullopt) {}

This needs formatting, and a range_from_token helper somewhere, but ideally the intent should be clear.

Comment on lines 206 to 218
if (leftIsSpace && !rightIsSpace) {
return -1; // left < right
} else if (!leftIsSpace && !rightIsSpace) {
return 0; // left == right
} else if (!leftIsSpace && rightIsSpace) {
return 1; // left > right
} else if (leftChar == rightChar) {
leftPtr++;
rightPtr++;
} else {
// mismatched indent. give up and say equal
// TODO(jez) Might want to handle this case better
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the else after if by rewriting:

Suggested change
if (leftIsSpace && !rightIsSpace) {
return -1; // left < right
} else if (!leftIsSpace && !rightIsSpace) {
return 0; // left == right
} else if (!leftIsSpace && rightIsSpace) {
return 1; // left > right
} else if (leftChar == rightChar) {
leftPtr++;
rightPtr++;
} else {
// mismatched indent. give up and say equal
// TODO(jez) Might want to handle this case better
return 0;
}
if (leftIsSpace && !rightIsSpace) {
return -1; // left < right
}
if (!leftIsSpace && !rightIsSpace) {
return 0; // left == right
}
if (!leftIsSpace && rightIsSpace) {
return 1; // left > right
}
if (leftChar != rightChar) {
// mismatched indent. give up and say equal
// TODO(jez) Might want to handle this case better
return 0;
}
leftPtr++;
rightPtr++;

Comment on lines 354 to 356
// When recovering from errors, sometimes we'd like to force a production rule to become an
// error if indentation didn't match in an attempt to both show an error near where the error
// belongs as well as so a tokens that would be consumed eagerly are left untouched for later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what goes after "as well as" here, but I don't think it's what currently comes after it. 😅

parser/Parser.cc Outdated
Comment on lines 147 to 145
// Always report the original parse errors
errorToError(gs, file, driver->diagnostics);

if (ast != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this path uses errorToError, but the indentation-aware path uses reportDiagnostics, perhaps we should just inline errorToError's code here to make the symmetry between the cases more explicit? I think that would also have the benefit of making the onlyHints logic more explicit, so that the reader doesn't get concerns about errors being emitted twice, only to discover that onlyHints would prevent that. (Still might be good to expand on the rationale for always reporting errors?)

parser/Parser.cc Outdated
Comment on lines 67 to 68
ENFORCE(diag.extra_location().has_value());
e.addErrorLine(rangeToLoc(gs, file, diag.extra_location().value()), "Matching token was here");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: we don't have any tests that check we point at the matching token?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add one

@jez jez force-pushed the jez-better-parser branch from 4d54227 to 44e274c Compare February 9, 2022 03:13
jez added 4 commits February 9, 2022 20:21
We're going to be making multiple drivers. It'll help to have a helper
function for that.
This will let us compute the indentation of the line a token is on.

We might want to populate the indentation of the token on construction,
but for simplicitly right now I'm just going to ask for the indentation
at the points where I ask for it.

This might not be the most efficent path forwards, but it might be the
sort of thing that we can just elide/not populate except once we know
there's a syntax error, and we can afford to be a little slower than
usual.

Also, this bloats the size of a `token` by one `size_t`.
@jez jez force-pushed the jez-better-parser branch from 9183020 to 52294cf Compare February 10, 2022 04:23
@jez jez merged commit 29b676e into master Feb 10, 2022
@jez jez deleted the jez-better-parser branch February 10, 2022 18:42
@jez jez added this to the Better Parser milestone Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0