Indentation-aware parser error recovery #5000

jez · 2021-12-10T20:44:17Z

Motivation

I recommend reviewing by commit.

Test plan

See included automated tests.

jez · 2022-02-07T22:21:22Z

test/testdata/lsp/completion/if_empty.rb

@@ -0,0 +1,12 @@
+# typed: true
+
+# TODO(jez) Fix this test


This is going to be fixed in a subsequent change. I'll make a ticket for this eventually.

jez · 2022-02-07T22:22:02Z

test/testdata/parser/error_recovery/if_do_2.rb

+    end
+    Integer.class
+  end # error: Hint: closing "end" token was not indented as far as "if" token
+end # error: unexpected token "end of file"


I'm going to address the TODO's in this file in a followup PR.

froydnj

This looks pretty reasonable, but there are enough comments/things to discuss that I think it's worth going through another round.

froydnj · 2022-02-08T15:10:16Z

parser/Parser.cc

+    core::Loc loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));
+    return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));


Suggested change

core::Loc loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));

return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));

return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));

froydnj · 2022-02-08T15:18:03Z

test/testdata/parser/error_recovery/def_missing_end_1.rb

+class A
+  def test1
+    if x.f
+    end
+
+  def test2
+    if x.f
+    end
+  end
+end # error: unexpected token "end of file"


My understanding is that something like this should be addressable within the framework we have here. Is that not the case, or does this test need a TODO?

My question is motivated by my surprise that this test and def_missing_end_2.rb produce nothing in the class definition. But perhaps that's because the exp files for them are desugar tests, and they should be some kind of parse-tree exp tests?

(I guess Ruby technically allows defs within defs, but I think that's kind of busted, and perhaps --stripe-mode could disallow that...?)

My understanding is that something like this should be addressable within the framework we have here. Is that not the case, or does this test need a TODO?

Yep! I just wanted to break up the change. I will be fixing this in future changes.

My question is motivated by my surprise that this test and def_missing_end_2.rb produce nothing in the class definition. But perhaps that's because the exp files for them are desugar tests, and they should be some kind of parse-tree exp tests?

I've been using desugar-tree tests simply because they're easier to skim to verify correctness. None of the parser formats are as easy to skim. The empty class definition you see is for <root> not A, which desugar wraps around all parse results, even empty ones.

froydnj · 2022-02-08T15:26:13Z

website/docs/error-reference.md

+This Ruby snippet does not parse, but the reason why is confusing. Sorbet (and
+the Ruby VM) attempt to parse this file as if it were indented like this:
+
+```ruby
+class A
+  def foo
+    if x
+    end
+  end
+```


I understand what this is attempting to communicate, but I think phrasing it as "as if it were indented like this" carries the unintended implication that Ruby (ala Python) cares about indentation, which is not the case. Maybe something like:

This Ruby snippet does not parse, but the reason why is confusing. Sorbet (and the Ruby VM) associate the first end with the if instead of the intended def. We can change the indentation to make Sorbet's view of the file clearer: ...

froydnj · 2022-02-08T15:30:05Z

parser/parser/include/ruby_parser/diagnostic.hh

+        : level_(lvl), type_(type), location_(token->start(), token->end()), data_(data),
+          extra_location_(extra_token != nullptr
+                              ? std::make_optional<range>(range(extra_token->start(), extra_token->end()))
+                              : std::nullopt) {}


Suggested change

: level_(lvl), type_(type), location_(token->start(), token->end()), data_(data),

extra_location_(extra_token != nullptr

? std::make_optional<range>(range(extra_token->start(), extra_token->end()))

: std::nullopt) {}

: diagnostic(lvl, type, range_from_token(token), data, extra_token != nullptr

? std::make_optional<range>(range_from_token(extra_token))

: std::nullopt) {}

This needs formatting, and a range_from_token helper somewhere, but ideally the intent should be clear.

froydnj · 2022-02-08T15:40:18Z

parser/parser/cc/lexer.rl

+        if (leftIsSpace && !rightIsSpace) {
+            return -1; // left < right
+        } else if (!leftIsSpace && !rightIsSpace) {
+            return 0; // left == right
+        } else if (!leftIsSpace && rightIsSpace) {
+            return 1;  // left > right
+        } else if (leftChar == rightChar) {
+            leftPtr++;
+            rightPtr++;
+        } else {
+            // mismatched indent. give up and say equal
+            // TODO(jez) Might want to handle this case better
+            return 0;
+        }


Can we avoid the else after if by rewriting:

Suggested change

if (leftIsSpace && !rightIsSpace) {

return -1; // left < right

} else if (!leftIsSpace && !rightIsSpace) {

return 0; // left == right

} else if (!leftIsSpace && rightIsSpace) {

return 1; // left > right

} else if (leftChar == rightChar) {

leftPtr++;

rightPtr++;

} else {

// mismatched indent. give up and say equal

// TODO(jez) Might want to handle this case better

return 0;

}

if (leftIsSpace && !rightIsSpace) {

return -1; // left < right

}

if (!leftIsSpace && !rightIsSpace) {

return 0; // left == right

}

if (!leftIsSpace && rightIsSpace) {

return 1; // left > right

}

if (leftChar != rightChar) {

// mismatched indent. give up and say equal

// TODO(jez) Might want to handle this case better

return 0;

}

leftPtr++;

rightPtr++;

froydnj · 2022-02-08T15:50:43Z

parser/parser/include/ruby_parser/driver.hh

+    // When recovering from errors, sometimes we'd like to force a production rule to become an
+    // error if indentation didn't match in an attempt to both show an error near where the error
+    // belongs as well as so a tokens that would be consumed eagerly are left untouched for later


I'm not exactly sure what goes after "as well as" here, but I don't think it's what currently comes after it. 😅

froydnj · 2022-02-08T16:00:55Z

parser/Parser.cc

+    // Always report the original parse errors
+    errorToError(gs, file, driver->diagnostics);
+
+    if (ast != nullptr) {


Since this path uses errorToError, but the indentation-aware path uses reportDiagnostics, perhaps we should just inline errorToError's code here to make the symmetry between the cases more explicit? I think that would also have the benefit of making the onlyHints logic more explicit, so that the reader doesn't get concerns about errors being emitted twice, only to discover that onlyHints would prevent that. (Still might be good to expand on the rationale for always reporting errors?)

froydnj · 2022-02-08T16:10:00Z

parser/Parser.cc

+            ENFORCE(diag.extra_location().has_value());
+            e.addErrorLine(rangeToLoc(gs, file, diag.extra_location().value()), "Matching token was here");


To be clear: we don't have any tests that check we point at the matching token?

i'll add one

parser/parser/codegen/generate_diagnostics.cc

We're going to be making multiple drivers. It'll help to have a helper function for that.

This will let us compute the indentation of the line a token is on. We might want to populate the indentation of the token on construction, but for simplicitly right now I'm just going to ask for the indentation at the points where I ask for it. This might not be the most efficent path forwards, but it might be the sort of thing that we can just elide/not populate except once we know there's a syntax error, and we can afford to be a little slower than usual. Also, this bloats the size of a `token` by one `size_t`.

... so that we can write a section for them in the error docs.

This change looks better ignoring whitespace.

This way we can make the messaging around the hint errors more clear ("they're just hints, might be wrong")

Here's hoping that this always takes so little time that it never shows up in the log.

jez force-pushed the jez-better-parser branch 2 times, most recently from 3fd3914 to 7c238cf Compare January 11, 2022 00:33

10000 jez force-pushed the jez-better-parser branch from 7c238cf to aeb3b6e Compare February 4, 2022 23:02

jez changed the title ~~wip: Make parser incremental~~ Make parser indentation-aware Feb 4, 2022

jez self-assigned this Feb 4, 2022

jez changed the title ~~Make parser indentation-aware~~ Indentation-aware parser error recovery Feb 4, 2022

jez force-pushed the jez-better-parser branch from aeb3b6e to 718d66a Compare February 5, 2022 00:27

jez mentioned this pull request Feb 5, 2022

No-op changes in service of indentation-aware parser #5261

Merged

jez force-pushed the jez-better-parser branch from 959aeaa to e66d1be Compare February 5, 2022 02:26

jez changed the base branch from master to jez-checkpoint February 5, 2022 02:26

jez force-pushed the jez-better-parser branch from e66d1be to 298514c Compare February 5, 2022 02:31

Base automatically changed from jez-checkpoint to master February 5, 2022 03:38

jez force-pushed the jez-better-parser branch 2 times, most recently from 084de07 to 5fd6e01 Compare February 5, 2022 07:26

jez marked this pull request as ready for review February 7, 2022 21:30

jez requested a review from a team as a code owner February 7, 2022 21:30

jez requested review from froydnj and removed request for a team February 7, 2022 21:30

jez commented Feb 7, 2022

View reviewed changes

jez force-pushed the jez-better-parser branch from 5f80edd to 439ed50 Compare February 8, 2022 00:31

froydnj requested changes Feb 8, 2022

View reviewed changes

aprocter mentioned this pull request Feb 8, 2022

Better error recovery for unterminated arrays #5273

Merged

jez requested a review from froydnj February 8, 2022 18:33

froydnj approved these changes Feb 8, 2022

View reviewed changes

jez force-pushed the jez-better-parser branch from 4d54227 to 44e274c Compare February 9, 2022 03:13

jez added 4 commits February 9, 2022 20:21

no-op: Factor out makeDriver helper

5b60b48

We're going to be making multiple drivers. It'll help to have a helper function for that.

Add compare_indent_level helper

613d0ef

Thread indendationAware boolean through driver

feeee30

jez added 25 commits February 9, 2022 20:21

Existing test improves

1367d23

Rename a test

6769c76

Show behavior before this PR

1b7f585

Show behavior after this PR

7198bdb

Also update newline_s after line comment

6a8e1b3

Add new completion tests

54c700a

Remove TODO

b04d195

Add another test

5e7251a

whoops, recorded this test wrong

acbf5b0

Turns out there's an overload

b8a3a59

Use addErrorNote to explain the message a bit

df70bad

Report error recovery errors as separate code

8e01c5c

... so that we can write a section for them in the error docs.

Prefix with Hint:

f07939f

Delete ErrorToError, just use functions

0a316b9

This change looks better ignoring whitespace.

Always report original errors

655382a

This way we can make the messaging around the hint errors more clear ("they're just hints, might be wrong")

Add one more line of context to the error

e863d42

Fix wording in error-reference

64091f0

whoops double loc

451345a

Suggested refactoring

1a68d80

reorder if/else

d96e1a7

Remove errorToError

3a93cc8

Fix comment

ff727fd

Add cli test showing error lines

14051b2

Change error message and location

98fe052

Add a Timer for the withIndentationAware phase

52294cf

Here's hoping that this always takes so little time that it never shows up in the log.

jez force-pushed the jez-better-parser branch from 9183020 to 52294cf Compare February 10, 2022 04:23

jez merged commit 29b676e into master Feb 10, 2022

jez deleted the jez-better-parser branch February 10, 2022 18:42

jez added this to the Better Parser milestone Feb 11, 2022

jez mentioned this pull request Feb 15, 2022

Allow eof everywhere kEND is allowed #5183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Indentation-aware parser error recovery #5000

Indentation-aware parser error recovery #5000

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		core::Loc loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));
		return core::Loc(file, translatePos(range.beginPos, maxOff - 1), translatePos(range.endPos, maxOff));

		ENFORCE(diag.extra_location().has_value());
		e.addErrorLine(rangeToLoc(gs, file, diag.extra_location().value()), "Matching token was here");

Indentation-aware parser error recovery #5000

Indentation-aware parser error recovery #5000

Uh oh!

Conversation

Uh oh!

Motivation

Test plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!