-
Notifications
You must be signed in to change notification settings - Fork 9
Improve Error Messages #350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
These are all good points. There is a standing list of ways that type checking error messages could be improved, and someday, I hope, improvements will be implemented. Producing good error messages, especially if they are to be easily understood by language novices, is a challenging problem. One issue that you point out is how to present the relevant code where the error occurred, or rather where the error was detected, since the actual source of the error may be somewhere else entirely. Prettyprinting the abstract syntax (type Absyn.absyn) is what we currently do. Saving the Ast syntax trees might help, but the problem is that that representation of the code is only partially parsed because of the problem of statically scoped infix declarations, so some context information (static environment) would be needed to complete the parsing of expressions and patterns occurring in the syntax tree. Another alternative might be to extract the source code text from the source file or an accumulated history of interactive input. But then there would be the problem of highlighting particular elements (e.g. variables and constants) that might be involved in the type error when the program fragment is represented as a text string. This general problem is hard and complex, and might require the expertise of a programming psychologist or human-computer interface expert. Meanwhile there are several limited fixes we could try, and some of these will be part of a new type checker I eventually plan to write as part of the NewFrontEnd project. There is a fairly extensive literature on the topic of how to produce good or better error messages for Hindley-Milner type inference and type checking systems. These deal mainly with how to identify exactly which program elements are responsible for the error. |
I don't think I necessarily agree with your assessment that these require complex engineering efforts. Yes, a better inference algorithm would be good too, but I chose these errors specifically because they are low-hanging fruit. |
The cosmetic improvement of current error messages is not complex, but requires design taste and nontrivial effort. I've thought for a while that a promising approach might be to detect an error (usually a failure of unification) and then invoke a kind of retrospective process to try to find an explanation and present a good error message. This assumes that the Ast syntax (partially parsed syntax trees) and the Absyn and perhaps the types themselves are adequately "instrumented". For instance, occurrences of type constructors now carry an origin annotation that is carried along during type inference and that can be used to identify where the type constructors were introduced in case they are involved in a unification failure. This is part of my "Culprits" idea to improve error messages, which is a simple alternative to several published proposals to produce better error messages during type inference and checking by augmenting the HM inference algorithm in various ways. I know of no good alternative to the basic (Curry-Newman-)Hindley-Milner type inference algorithm. But the problem of identifying the cause of a type error is made difficult because the point in the program where the error is detected (where a unification fails) is often not the actual cause of the error. If two type constructors do not match during unification, you could go back to the (two) points in the program where those type constructors were introduced (via being in the type of a variable occurrence), but after they were introduced they flow around the program through the unification process and sometimes the error occurs somewhere midway through that "flow" (I can supply examples). So one might need to monitor not only the "sources" of clashing type constructors but what happens to them through the course of the type inference "flow". Other complicating factors are overloaded variables like "+" and overloaded literals like "3" and the not precisely defined "scopes" in which they must be resolved. In (S)ML, there is also a minor issue of (what I call) "explicit" type variable and their treatment in type checking (e.g. what is their implicit binding point and scope). This was partially fixed in SML '97, in which one can optionally explicitly bind a type variable in a value declaration. Consider the following mixed example in SML '97:
when type checked in SML/NJ, this produces an error message that says that 'a does not match 'a, which is a bit confusing, especially when it arrises in a much larger and more complicated piece of code where it is not so obvious what is going wrong. [Hint: it just another case of "free variable capture".] |
Description
Currently, error messages have a lot of issues. I'll use the following output to demonstrate.
Error messages show overload type variables when they aren't relevant.
This example only uses
3 : int
; its error has nothing to do with the overloading feature.The
'Z[INT]
syntax is very confusing to new users and the information is never useful to those familiar with the language: it should not appear in error messages.Unification errors between tuples/records include parts that do successfully unify.
In this case, the first error should be reported as a disagreement between
string
andint
, notstring * string
andint * string
. This is often a big problem when working with records of 5 or more fields, since the error message does not tell you which field has the type error.CoreML is output in a very odd way.
If SML/NJ is going to print out a desugared version of the code, it should make sure it is valid and readable.
This means something like:
ie. Include
val rec
or convert tofun
; don't put unnecessary parentheses and indentation forfn
s that came from the curried arguments of afun
; make sure that variable names don't collide (usearg1
andarg2
instead ofarg
for both); use better indentation rules forcase
(ensure that theof
is not de-indented relative tocase
).Even better would be to keep track of the original AST for error reporting purposes but this may be harder to implement
Align error output or don't: but don't be inconsistent.
The first error message aligns
string * string
and'Z[INT] * string
, while the second error message does not align'Z -> _
and'Y[INT]
.Consistently report unconstrained type variables.
The second error message uses
'Z -> _
, even though the error has no relation to the domain vs. range of the function.It should either be reported as
_ -> _
, or as'a -> 'b
or similar.Handling metavars in error messages in general is an issue.
Obviously this is a big feature with many parts, but I thought that there should be a GitHub issue to document the current problems.
The text was updated successfully, but these errors were encountered: