Better error recovery for unterminated arrays #5273

aprocter · 2022-02-08T02:54:00Z

Another attempt to address #5182. Still not sure it does everything we want, but unlike my last attempt (#5269) I think it's unlikely to disrupt current desired behaviors.

The strategy this time is to identify a whole bunch of tokens that can't possibly occur right after tLBRACK aref_args, and add an erroring rule when one of them occurs. I found the list of tokens that works here largely by trial and error (let's throw some more tokens in and see if we get any SR/RR conflicts), but intuitively I think the list makes sense: mostly they're "bracket-closing" tokens like ) and end.

Example:

# typed: true

class A
  def f
    [1,
  #   ^ error: unexpected token ","
  # ^ error: unterminated "["
    puts "hi"
  end
  def g
    [
  # ^ error: unterminated "["
  end
  def h
    puts "ho"
  end
end

class B
  def q
    puts "ho"
    [
  # ^ error: unterminated "["
  end
  def r
    puts "hi"
  end
  def s
    puts ([] + [)
             # ^ error: unterminated "["
  end
  def t
    puts "hi"
  end
end

Parse tree and output before:

s(:begin)
test/testdata/parser/error_recovery/unterminated_array.rb:5: unexpected token "," https://srb.help/2001
     5 |    [1,
              ^

test/testdata/parser/error_recovery/unterminated_array.rb:29: unexpected token ")" https://srb.help/2001
    29 |    puts ([] + [)
                        ^
Errors: 2

Parse tree and output after:

s(:begin,
  s(:class,
    s(:const, nil, :A), nil,
    s(:begin,
      s(:def, :f, nil,
        s(:array,
          s(:const, nil, :<ErrorNode>))),
      s(:def, :g, nil,
        s(:array)),
      s(:def, :h, nil,
        s(:send, nil, :puts,
          s(:str, "ho"))))),
  s(:class,
    s(:const, nil, :B), nil,
    s(:begin,
      s(:def, :q, nil,
        s(:begin,
          s(:send, nil, :puts,
            s(:str, "ho")),
          s(:array))),
      s(:def, :r, nil,
        s(:send, nil, :puts,
          s(:str, "hi"))),
      s(:def, :s, nil,
        s(:send, nil, :puts,
          s(:begin,
            s(:send,
              s(:array), :+,
              s(:array))))),
      s(:def, :t, nil,
        s(:send, nil, :puts,
          s(:str, "hi"))))))
test/testdata/parser/error_recovery/unterminated_array.rb:5: unexpected token "," https://srb.help/2001
     5 |    [1,
              ^

test/testdata/parser/error_recovery/unterminated_array.rb:5: unterminated "[" https://srb.help/2001
     5 |    [1,
            ^

test/testdata/parser/error_recovery/unterminated_array.rb:11: unterminated "[" https://srb.help/2001
    11 |    [
            ^

test/testdata/parser/error_recovery/unterminated_array.rb:22: unterminated "[" https://srb.help/2001
    22 |    [
            ^

test/testdata/parser/error_recovery/unterminated_array.rb:29: unterminated "[" https://srb.help/2001
    29 |    puts ([] + [)
                       ^
Errors: 5

Motivation

Better error recovery, better pizza. (pizza == IDE responsiveness)

Test plan

See included automated tests.

jez

This is definitely a clever idea, and one I've toyed with before as well!

But ultimately, I really, really think that the final solution for unmatched [ brackets is going to involve using the indentation-aware techniques that #5000 is going to introduce.

I have a hunch that we can substantially improve on the results that this PR achieves if we use that approach instead. Do you want to chat more about this?

jez · 2022-02-08T05:25:43Z

parser/parser/cc/grammars/typedruby.ypp

+                    {
+                      $$ = driver.build.array(self, $1, $2, $3);
+                      driver.diagnostics.emplace_back(dlevel::ERROR, dclass::UnterminatedToken, diagnostic::range(@1.begin, @1.end), "\"[\"");
+                      driver.rewind_and_reset(@2.end);


I think that this use of rewind_and_reset is a little suspect. I think that if we're sold on this approach, the lexer state we want to be in once we've said that the recovery phase is done is expr_beg, because that's the state that the lexer would have transitioned to had it correctly seen ] here.

Hmm, yeah, beg does make more sense here! I'll try switching to that. (Maybe it'll make some of the less-than-ideal parses you identified better---who knows?)

I think expr_end might be the right state, actually...

sorbet/parser/parser/cc/lexer.rl

Lines 2824 to 2839 in f5d8393

e_rbrace | e_rparen | ']'

=> {

emit_table(PUNCTUATION);

cond.pop();

cmdarg.pop();

if (ts[0] == '}' || ts[0] == ']') {

fnext expr_end;

} else { // ')'

// this was commented out in the original lexer.rl:

// fnext expr_endfn; ?

}

fbreak;

};

This is consistent with what the Ruby Hacking Guide has to say (https://whitequark.org/blog//2013/04/01/ruby-hacking-guide-ch-11-finite-state-lexer/). Whatcha think?

jez · 2022-02-08T05:29:56Z

parser/parser/cc/grammars/typedruby.ypp

+array_premature_end: eof
+                   | kRESCUE
+                   | kENSURE
+                   | kEND
+                   | kTHEN
+                   | kELSIF
+                   | kELSE
+                   | kWHEN
+                   | kIN
+                   | tRPAREN
+                   | tRCURLY
+                   | tCOLON


This seems a little suspicious. I'm worried that this is going to be brittle / cause confusion when the upstream ruby grammar changes in the future in such a way that makes one of these token somehow expected, or if new tokens are added to the lexer but not added here.

I think there are two potential maintenance headaches here:

The grammar changes in the future, such that one of these tokens is now expected. In this case, it would manifest as a shift/reduce conflict which bison should flag for us, and it probably wouldn't take too long to work back from there to the fact that this production is implicated. In that case, we'd have to trim one of the tokens from this rule and it is possible that some recoverable parses would become unrecoverable, or at least not recover as gracefully.

The grammar changes in the future, such that new tokens are added that are not expected here. In this case, we wouldn't get any guidance from bison, but I don't think this is such a big deal, since it won't result in incorrect parses of valid syntax---it might just take us a while to realize that there are new tokens that need to be added here to improve the quality of error recovery.

Neither one of these is ideal, but I'm not sure they're showstoppers.

jez · 2022-02-08T05:34:24Z

test/testdata/parser/error_recovery/unterminated_array.rb

+  def t
+    puts "hi"
+  end
+end


Given the implementation above, I'm a little worried that this only does well on a handful of test cases. Some test cases I've found that don't look that great:

# typed: true class A X = [1, sig {void} def bar end end

drops the 1 from the array

drops the whole sig + method def

# typed: true class A def bar puts 'before' x = [1, puts 'after' end end

again drops the 1 from the array

drops the puts 'after' from the method body

test/testdata/parser/error_recovery/unterminated_array.rb.parse-tree.exp

aprocter · 2022-02-08T18:05:40Z

But ultimately, I really, really think that the final solution for unmatched [ brackets is going to involve using the indentation-aware techniques that https://github.com//pull/5000 is going to introduce.

I have a hunch that we can substantially improve on the results that this PR achieves if we use that approach instead. Do you want to chat more about this?

Sure! Let's chat about it today. If the indentation-aware techniques are going to handle things better, there's no need for redundancy here.

jez

my idea for how to fix this di 8000 dn't pan out, and in the interest of not blocking things exclusively for FUD without concrete founding i figure we should go ahead with this change. it at least is better than nothing (but we acknowledge that it might be the sort of thing that we want or need to roll back some time in the future).

aprocter · 2022-02-11T21:30:42Z

@jez: I switched from parse-tree to desugar-tree, and commented on why I think expr_end is probably the right state to end up in when we rewind, so figured I'd ask for another look when you get a chance.

aprocter · 2022-02-11T21:35:25Z

🤦 Sorry, forgot to actually push :) should be up to date now

jez

Seems still not perfect but better than nothing.

#5273 (comment)

^ can you add these tests I suggested? i'd like to at least record the state of the world

aprocter · 2022-02-15T23:42:26Z

can you add these tests I suggested? i'd like to at least record the state of the world

Will do!

…tter

aprocter requested a review from a team as a code owner February 8, 2022 02:54

aprocter requested review from jez and removed request for a team February 8, 2022 02:54

jez reviewed Feb 8, 2022

View reviewed changes

jez approved these changes Feb 8, 2022

View reviewed changes

jez assigned aprocter Feb 8, 2022

aprocter requested a review from jez February 11, 2022 21:30

aprocter force-pushed the aprocter/unterminated-array-end branch from 516dc3f to af82fcd Compare February 11, 2022 21:35

jez approved these changes Feb 15, 2022

View reviewed changes

aprocter added 5 commits February 15, 2022 15:46

Add test case

1909574

Add support for recovering from unterminated arrays in some cases

c7dd1eb

Add exp file for parse tree
5c1bd9c

Switch to desugar-tree

eb743f3

Add tests for current state of the world where the parses could be be…

39945c7

…tter

aprocter-stripe force-pushed the aprocter/unterminated-array-end branch from af82fcd to 39945c7 Compare February 15, 2022 23:55

aprocter enabled auto-merge (squash) February 16, 2022 00:00

aprocter merged commit baa9dac into master Feb 16, 2022

aprocter deleted the aprocter/unterminated-array-end branch February 16, 2022 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better error recovery for unterminated arrays #5273

Better error recovery for unterminated arrays #5273

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	e_rbrace \| e_rparen \| ']'
	=> {
	emit_table(PUNCTUATION);

	cond.pop();
	cmdarg.pop();

	if (ts[0] == '}' \|\| ts[0] == ']') {
	fnext expr_end;
	} else { // ')'
	// this was commented out in the original lexer.rl:
	// fnext expr_endfn; ?
	}

	fbreak;
	};

Better error recovery for unterminated arrays #5273

Better error recovery for unterminated arrays #5273

Uh oh!

Conversation

Motivation

Test plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!