-
Notifications
You must be signed in to change notification settings - Fork 566
Better error recovery for unterminated arrays #5273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a clever idea, and one I've toyed with before as well!
But ultimately, I really, really think that the final solution for unmatched [
brackets is going to involve using the indentation-aware techniques that #5000 is going to introduce.
I have a hunch that we can substantially improve on the results that this PR achieves if we use that approach instead. Do you want to chat more about this?
{ | ||
$$ = driver.build.array(self, $1, $2, $3); | ||
driver.diagnostics.emplace_back(dlevel::ERROR, dclass::UnterminatedToken, diagnostic::range(@1.begin, @1.end), "\"[\""); | ||
driver.rewind_and_reset(@2.end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this use of rewind_and_reset
is a little suspect. I think that if we're sold on this approach, the lexer state we want to be in once we've said that the recovery phase is done is expr_beg
, because that's the state that the lexer would have transitioned to had it correctly seen ]
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, beg
does make more sense here! I'll try switching to that. (Maybe it'll make some of the less-than-ideal parses you identified better---who knows?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think expr_end
might be the right state, actually...
sorbet/parser/parser/cc/lexer.rl
Lines 2824 to 2839 in f5d8393
e_rbrace | e_rparen | ']' | |
=> { | |
emit_table(PUNCTUATION); | |
cond.pop(); | |
cmdarg.pop(); | |
if (ts[0] == '}' || ts[0] == ']') { | |
fnext expr_end; | |
} else { // ')' | |
// this was commented out in the original lexer.rl: | |
// fnext expr_endfn; ? | |
} | |
fbreak; | |
}; |
This is consistent with what the Ruby Hacking Guide has to say (https://whitequark.org/blog//2013/04/01/ruby-hacking-guide-ch-11-finite-state-lexer/). Whatcha think?
array_premature_end: eof | ||
| kRESCUE | ||
| kENSURE | ||
| kEND | ||
| kTHEN | ||
| kELSIF | ||
| kELSE | ||
| kWHEN | ||
| kIN | ||
| tRPAREN | ||
| tRCURLY | ||
| tCOLON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little suspicious. I'm worried that this is going to be brittle / cause confusion when the upstream ruby grammar changes in the future in such a way that makes one of these token somehow expected, or if new tokens are added to the lexer but not added here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two potential maintenance headaches here:
- The grammar changes in the future, such that one of these tokens is now expected. In this case, it would manifest as a shift/reduce conflict which bison should flag for us, and it probably wouldn't take too long to work back from there to the fact that this production is implicated. In that case, we'd have to trim one of the tokens from this rule and it is possible that some recoverable parses would become unrecoverable, or at least not recover as gracefully.
- The grammar changes in the future, such that new tokens are added that are not expected here. In this case, we wouldn't get any guidance from bison, but I don't think this is such a big deal, since it won't result in incorrect parses of valid syntax---it might just take us a while to realize that there are new tokens that need to be added here to improve the quality of error recovery.
Neither one of these is ideal, but I'm not sure they're showstoppers.
def t | ||
puts "hi" | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the implementation above, I'm a little worried that this only does well on a handful of test cases. Some test cases I've found that don't look that great:
# typed: true
class A
X = [1,
sig {void}
def bar
end
end
- drops the
1
from the array - drops the whole sig + method def
# typed: true
class A
def bar
puts 'before'
x = [1,
puts 'after'
end
end
- again drops the
1
from the array - drops the
puts 'after'
from the method body
test/testdata/parser/error_recovery/unterminated_array.rb.parse-tree.exp
Outdated
Show resolved
Hide resolved
Sure! Let's chat about it today. If the indentation-aware techniques are going to handle things better, there's no need for redundancy here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my idea for how to fix this di 8000 dn't pan out, and in the interest of not blocking things exclusively for FUD without concrete founding i figure we should go ahead with this change. it at least is better than nothing (but we acknowledge that it might be the sort of thing that we want or need to roll back some time in the future).
@jez: I switched from |
516dc3f
to
af82fcd
Compare
🤦 Sorry, forgot to actually push :) should be up to date now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems still not perfect but better than nothing.
^ can you add these tests I suggested? i'd like to at least record the state of the world
Will do! |
af82fcd
to
39945c7
Compare
Another attempt to address #5182. Still not sure it does everything we want, but unlike my last attempt (#5269) I think it's unlikely to disrupt current desired behaviors.
The strategy this time is to identify a whole bunch of tokens that can't possibly occur right after
tLBRACK aref_args
, and add an erroring rule when one of them occurs. I found the list of tokens that works here largely by trial and error (let's throw some more tokens in and see if we get any SR/RR conflicts), but intuitively I think the list makes sense: mostly they're "bracket-closing" tokens like)
andend
.Example:
Parse tree and output before:
Parse tree and output after:
Motivation
Better error recovery, better pizza. (pizza == IDE responsiveness)
Test plan
See included automated tests.