Description
I ran into a situation where I was asking pydantic-xml to parse very large XML documents, and was receiving errors like: lxml.etree.XMLSyntaxError: CData section too big found, line 66036, column 194
.
According to https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XMLParser, this can be increased with the huge_tree=True
parameter. However, there does not appear to be a way to enable this for the pydantic-xml parser.
I was able to solve my issues by monkey-patching like so:
from pydantic_xml.model import BaseXmlModel
from lxml import etree
def _from_xml(cls, source, context=None):
"""
Deserializes an xml string to an object of `cls` type.
:param source: xml string
:param context: pydantic validation context
:return: deserialized object
"""
parser = etree.XMLParser(huge_tree=True)
return cls.from_xml_tree(etree.fromstring(source, parser), context=context)
BaseXmlModel.from_xml = classmethod(_from_xml)
It would be nice if huge_tree were exposed as part of the interface when running from_xml
. Is this change desirable? If so, I'd be happy to write a PR. Maybe even something a bit more general to be able to pass arbitrary arguments to the XMLParser
creation.