8000 HTML4 parser - lexer reached a stuck state · Issue #317 · smlnj/legacy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

HTML4 parser - lexer reached a stuck state #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 12 tasks
Skyb0rg007 opened this issue Jul 7, 2024 · 1 comment
Open
2 of 12 tasks

HTML4 parser - lexer reached a stuck state #317

Skyb0rg007 opened this issue Jul 7, 2024 · 1 comment
Assignees
Labels
bug Something isn't working html4-lib Issue with HTML4 component of SML/NJ Library

Comments

@Skyb0rg007
Copy link
Contributor

Version

110.99.5 (Latest)

Operating System

  • Any
  • Linux
  • macOS
  • Windows
  • Other Unix

OS Version

No response

Processor

  • Any
  • Arm (using Rosetta)
  • PowerPC
  • Sparc
  • x86 (32-bit)
  • x86-64 (64-bit)
  • Other

System Component

SML/NJ Library

Severity

Minor

Description

The HTML4 lexer does not specify rules for all inputs

Transcript

- HTML4Parser.fromString "<";
uncaught exception Fail [Fail: lexer reached a stuck state]
  raised at: smlnj-lib/HTML4/html4.l.sml:94.46-94.80
             ml-lpt/lib/err-handler.sml:261.63

Expected Behavior

- HTML4Parser.fromString "<";
val it = NONE : html option

Note that this is what happens when you pass an incomplete tag such as "<x"

Steps to Reproduce

See transcript

Additional Information

The issue is in html4.l. There needs to be a case that handles "<" and "</" that are not followed by an alpha character or "!--".

Email address

skyler DOT soss AT gmail.com

@Skyb0rg007 Skyb0rg007 added the bug Something isn't working label Jul 7, 2024
@Skyb0rg007 Skyb0rg007 changed the title Summary description of bug HTML4 parser - lexer reached a stuck state Jul 7, 2024
@JohnReppy JohnReppy added the html4-lib Issue with HTML4 component of SML/NJ Library label Jul 7, 2024
@JohnReppy JohnReppy self-assigned this Jul 7, 2024
@JohnReppy
Copy link
Contributor

As currently implemented, this library does not have the infrastructure to produce error messages. I have modified the lexer so that it now raises a more informative exception (this change will be included in 110.99.6).

- HTML4Parser.fromString "<";

uncaught exception Fail [Fail: Unexpected character '<']
  raised at: html4.l.sml:230.16-230.67
             ml-lpt/lib/err-handler.sml:261.63

A complete fix (i.e., properly reporting an error message and returning NONE) will require a lot more work and possibly a change to the API, so I'm leaving the bug open for now.

JohnReppy added a commit that referenced this issue Sep 16, 2024
A complete fix will require a lot more work and possibly a change to the
API, so I'm leaving the bug open for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working html4-lib Issue with HTML4 component of SML/NJ Library
Projects
None yet
Development

No branches or pull requests

2 participants
0