Closed
Description
Description
According to the XML-Spec, <
, >
, &
have to be encoded in attributes and text nodes.
In attributes additionally '
and "
have to be encoded.
The XMLSerializer
does this encoding according to the spec. (except for '
in attributes, which is a bug, but super easily fixed)
The parser on the other hand, decodes all 5 entities in attributes AND in text nodes.
I have to process XMLs, where all 5 entities are also encoded for text fields. Parsing, modifying and then serializing these XMLs then changes all the text nodes.
How to replicate
// test.mjs
import { DOMParser, XMLSerializer } from '@xmldom/xmldom';
const testxml =
`<?xml version="1.0" encoding="UTF-8"?>
<rootel xmlns="http://soap.sforce.com/2006/04/metadata">
<textnode testattribute="& < > ' "">
&
<
>
'
"
</textnode>
</rootel>
`;
const xmldoc = new DOMParser().parseFromString(testxml, 'text/xml');
const serializedXml = new XMLSerializer().serializeToString(xmldoc);
console.log(serializedXml);
outputs this:
<?xml version="1.0" encoding="UTF-8"?>
<rootel xmlns="http://soap.sforce.com/2006/04/metadata">
<textnode testattribute="& < > ' "">
&
<
>
'
"
</textnode>
</rootel>
Solution
I am happy to open a PR for this, but first wanted to clarify the approach:
- simplest one: change the serializer, to encode all entities for text and attributes
- It's a very simple 2 lines change, but it then encodes more chars than required by the spec
- OR: change parser to only decode
&
,<
and>
for text nodes (here)- should only limit it in XML mode, would need to stay the same for html
- would be spec compliant
- could be breaking for people who are used to have all 5 entities being decoded
Metadata
Metadata
Assignees
Type
Projects
Status
Done