8000 Do not treat `<style>` elements in `<body>` as a newline · Issue #10643 · jgm/pandoc · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Do not treat <style> elements in <body> as a newline #10643
Closed
@aphedges

Description

@aphedges

Explain the problem.
The HTML reader should treat <style> elements in <body> as an empty string, not a newline.

I encountered <style> elements in <body> in the wild (they are extremely common on Wikipedia), and I noticed that Pandoc renders them very differently from a web browser:

$ echo '<p>A<style></style>B</p>' | ./pandoc --from html --to gfm
A

B
$ echo '<p>A<style></style>B</p>' | ./pandoc --from html --to native
[ Plain [ Str "A" ] , Plain [ Str "B" ] ]

I didn't check every platform, but <p>A<style></style>B</p> renders as simply AB in Firefox, Chromium, and Safari on macOS.

I looked at the <style> specification to figure out what the expected behavior is here, but I found the HTML parsing specification to be extremely difficult to understand. The W3C Markup Validation Service does confirm that including <style> elements in <body> is invalid, but handling invalid HTML seems to be within Pandoc's scope according to #9090 (comment): "Well, we already handle many cases of invalid HTML. If there are other particular ones that come up, feel free to report."

I will report this problem upstream to Wikipedia at some point, but it appears to be a fundamental part of how MediaWiki templates work. I first noticed this bug on Circadian rhythm - Wikipedia, which includes 16 <style> elements in <body>. I therefore expect this invalid HTML to be difficult to change there, at least within the near future. According to Help:Markup validation - Wikipedia, they appear to want to avoid invalid markup, though.

Pandoc version?
macOS on Apple Silicon (albeit an x86_64 executable running under Rosetta2)
pandoc 3.6.3-nightly-2025-02-24
Features: +server +lua
Scripting engine: Lua 5.4

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0