Closed
Description
What version of Go are you using (go version
)?
$ go version go version go1.21.1 linux/amd64
golang.org/x/net v0.15.0
Does this issue reproduce with the latest release?
Yes, golang.org/x/net v0.15.0
is the latest version
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE='' GOARCH='amd64' GOBIN='' GOCACHE='/home/maciekmm/.cache/go-build' GOENV='/home/maciekmm/.config/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='amd64' GOHOSTOS='linux' GOINSECURE='' GOMODCACHE='/home/maciekmm/go/pkg/mod' GONOPROXY='' GONOSUMDB='' GOOS='linux' GOPATH='/home/maciekmm/go' GOPRIVATE='' GOPROXY='https://proxy.golang.org,direct' GOROOT='/usr/lib/go' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64' GOVCS='' GOVERSION='go1.21.1' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='gcc' CXX='g++' CGO_ENABLED='1' GOMOD='/dev/null' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2511272611=/tmp/go-build -gno-record-gcc-switches'
What did you do?
Calling the Tokenizer with HTML element containing SOLIDUS (/) in the attribute name results in incorrect tokenization.
This is due to violation of the following rules in the WHATWG spec:
- https://html.spec.whatwg.org/multipage/parsing.html#after-attribute-name-state ->
- https://html.spec.whatwg.org/multipage/parsing.html#self-closing-start-tag-state (we are not reconsuming in before attribute state)
Test cases:
https://go.dev/play/p/ne5aV9XWVBd
What did you expect to see?
I expected to have the HTML code with attributes containing the solidus /
character tokenized correctly with following inputs:
{
"forward slash before attribute name",
`<p/=">`,
`<p ="="">`,
},
{
"forward slash before attribute name with spaces around",
`<p / =">`,
`<p ="="">`,
},
{
"forward slash in the attribute name followed by character",
`<p a/ ="">`,
`<p a="" =""="">`,
}
What did you see instead?
<p/=">
-> EOF
<p / =">
-> EOF
<p a/ ="">
-> <p a="">