8000 Better error recovery for unterminated arrays by aprocter · Pull Request #5273 · sorbet/sorbet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Better error recovery for unterminated arrays #5273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 16, 2022

Conversation

aprocter
Copy link
Contributor
@aprocter aprocter commented Feb 8, 2022

Another attempt to address #5182. Still not sure it does everything we want, but unlike my last attempt (#5269) I think it's unlikely to disrupt current desired behaviors.

The strategy this time is to identify a whole bunch of tokens that can't possibly occur right after tLBRACK aref_args, and add an erroring rule when one of them occurs. I found the list of tokens that works here largely by trial and error (let's throw some more tokens in and see if we get any SR/RR conflicts), but intuitively I think the list makes sense: mostly they're "bracket-closing" tokens like ) and end.

Example:

# typed: true

class A
  def f
    [1,
  #   ^ error: unexpected token ","
  # ^ error: unterminated "["
    puts "hi"
  end
  def g
    [
  # ^ error: unterminated "["
  end
  def h
    puts "ho"
  end
end

class B
  def q
    puts "ho"
    [
  # ^ error: unterminated "["
  end
  def r
    puts "hi"
  end
  def s
    puts ([] + [)
             # ^ error: unterminated "["
  end
  def t
    puts "hi"
  end
end

Parse tree and output before:

s(:begin)
test/testdata/parser/error_recovery/unterminated_array.rb:5: unexpected token "," https://srb.help/2001
     5 |    [1,
              ^

test/testdata/parser/error_recovery/unterminated_array.rb:29: unexpected token ")" https://srb.help/2001
    29 |    puts ([] + [)
                        ^
Errors: 2

Parse tree and output after:

s(:begin,
  s(:class,
    s(:const, nil, :A), nil,
    s(:begin,
      s(:def, :f, nil,
        s(:array,
          s(:const, nil, :<ErrorNode>))),
      s(:def, :g, nil,
        s(:array)),
      s(:def, :h, nil,
        s(:send, nil, :puts,
          s(:str, "ho"))))),
  s(:class,
    s(:const, nil, :B), nil,
    s(:begin,
      s(:def, :q, nil,
        s(:begin,
          s(:send, nil, :puts,
            s(:str, "ho")),
          s(:array))),
      s(:def, :r, nil,
        s(:send, nil, :puts,
          s(:str, "hi"))),
      s(:def, :s, nil,
        s(:send, nil, :puts,
          s(:begin,
            s(:send,
              s(:array), :+,
              s(:array))))),
      s(:def, :t, nil,
        s(:send, nil, :puts,
          s(:str, "hi"))))))
test/testdata/parser/error_recovery/unterminated_array.rb:5: unexpected token "," https://srb.help/2001
     5 |    [1,
              ^

test/testdata/parser/error_recovery/unterminated_array.rb:5: unterminated "[" https://srb.help/2001
     5 |    [1,
            ^

test/testdata/parser/error_recovery/unterminated_array.rb:11: unterminated "[" https://srb.help/2001
    11 |    [
            ^

test/testdata/parser/error_recovery/unterminated_array.rb:22: unterminated "[" https://srb.help/2001
    22 |    [
            ^

test/testdata/parser/error_recovery/unterminated_array.rb:29: unterminated "[" https://srb.help/2001
    29 |    puts ([] + [)
                       ^
Errors: 5

Motivation

Better error recovery, better pizza. (pizza == IDE responsiveness)

Test plan

See included automated tests.

@aprocter aprocter requested a review from a team as a code owner February 8, 2022 02:54
@aprocter aprocter requested review from jez and removed request for a team February 8, 2022 02:54
Copy link
Collaborator
@jez jez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a clever idea, and one I've toyed with before as well!

But ultimately, I really, really think that the final solution for unmatched [ brackets is going to involve using the indentation-aware techniques that #5000 is going to introduce.

I have a hunch that we can substantially improve on the results that this PR achieves if we use that approach instead. Do you want to chat more about this?

{
$$ = driver.build.array(self, $1, $2, $3);
driver.diagnostics.emplace_back(dlevel::ERROR, dclass::UnterminatedToken, diagnostic::range(@1.begin, @1.end), "\"[\"");
driver.rewind_and_reset(@2.end);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this use of rewind_and_reset is a little suspect. I think that if we're sold on this approach, the lexer state we want to be in once we've said that the recovery phase is done is expr_beg, because that's the state that the lexer would have transitioned to had it correctly seen ] here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, beg does make more sense here! I'll try switching to that. (Maybe it'll make some of the less-than-ideal parses you identified better---who knows?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think expr_end might be the right state, actually...

e_rbrace | e_rparen | ']'
=> {
emit_table(PUNCTUATION);
cond.pop();
cmdarg.pop();
if (ts[0] == '}' || ts[0] == ']') {
fnext expr_end;
} else { // ')'
// this was commented out in the original lexer.rl:
// fnext expr_endfn; ?
}
fbreak;
};

This is consistent with what the Ruby Hacking Guide has to say (https://whitequark.org/blog//2013/04/01/ruby-hacking-guide-ch-11-finite-state-lexer/). Whatcha think?

Comment on lines +1659 to +1869
array_premature_end: eof
| kRESCUE
| kENSURE
| kEND
| kTHEN
| kELSIF
| kELSE
| kWHEN
| kIN
| tRPAREN
| tRCURLY
| tCOLON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little suspicious. I'm worried that this is going to be brittle / cause confusion when the upstream ruby grammar changes in the future in such a way that makes one of these token somehow expected, or if new tokens are added to the lexer but not added here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two potential maintenance headaches here:

  1. The grammar changes in the future, such that one of these tokens is now expected. In this case, it would manifest as a shift/reduce conflict which bison should flag for us, and it probably wouldn't take too long to work back from there to the fact that this production is implicated. In that case, we'd have to trim one of the tokens from this rule and it is possible that some recoverable parses would become unrecoverable, or at least not recover as gracefully.
  2. The grammar changes in the future, such that new tokens are added that are not expected here. In this case, we wouldn't get any guidance from bison, but I don't think this is such a big deal, since it won't result in incorrect parses of valid syntax---it might just take us a while to realize that there are new tokens that need to be added here to improve the quality of error recovery.

Neither one of these is ideal, but I'm not sure they're showstoppers.

def t
puts "hi"
end
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the implementation above, I'm a little worried that this only does well on a handful of test cases. Some test cases I've found that don't look that great:

# typed: true

class A
  X = [1,

  sig {void}
  def bar
  end
end
  • drops the 1 from the array
  • drops the whole sig + method def
# typed: true

class A
  def bar
    puts 'before'
    x = [1,
    puts 'after'
  end
end
  • again drops the 1 from the array
  • drops the puts 'after' from the method body

@aprocter
Copy link
Contributor Author
aprocter commented Feb 8, 2022

But ultimately, I really, really think that the final solution for unmatched [ brackets is going to involve using the indentation-aware techniques that https://github.com//pull/5000 is going to introduce.

I have a hunch that we can substantially improve on the results that this PR achieves if we use that approach instead. Do you want to chat more about this?

Sure! Let's chat about it today. If the indentation-aware techniques are going to handle things better, there's no need for redundancy here.

Copy link
Collaborator
@jez jez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my idea for how to fix this di 8000 dn't pan out, and in the interest of not blocking things exclusively for FUD without concrete founding i figure we should go ahead with this change. it at least is better than nothing (but we acknowledge that it might be the sort of thing that we want or need to roll back some time in the future).

@aprocter aprocter requested a review from jez February 11, 2022 21:30
@aprocter
Copy link
Contributor Author

@jez: I switched from parse-tree to desugar-tree, and commented on why I think expr_end is probably the right state to end up in when we rewind, so figured I'd ask for another look when you get a chance.

@aprocter aprocter force-pushed the aprocter/unterminated-array-end branch from 516dc3f to af82fcd Compare February 11, 2022 21:35
@aprocter
Copy link
Contributor Author
aprocter commented Feb 11, 2022

🤦 Sorry, forgot to actually push :) should be up to date now

Copy link
Collaborator
@jez jez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems still not perfect but better than nothing.

#5273 (comment)

^ can you add these tests I suggested? i'd like to at least record the state of the world

@aprocter
Copy link
Contributor Author

can you add these tests I suggested? i'd like to at least record the state of the world

Will do!

@aprocter-stripe aprocter-stripe force-pushed the aprocter/unterminated-array-end branch from af82fcd to 39945c7 Compare February 15, 2022 23:55
@aprocter aprocter enabled auto-merge (squash) February 16, 2022 00:00
@aprocter aprocter merged commit baa9dac into master Feb 16, 2022
@aprocter aprocter deleted the aprocter/unterminated-array-end branch February 16, 2022 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0