huge_tree option for XML parser

I ran into a situation where I was asking pydantic-xml to parse very large XML documents, and was receiving errors like: lxml.etree.XMLSyntaxError: CData section too big found, line 66036, column 194.

According to https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XMLParser, this can be increased with the huge_tree=True parameter. However, there does not appear to be a way to enable this for the pydantic-xml parser.

I was able to solve my issues by monkey-patching like so:

from pydantic_xml.model import BaseXmlModel
from lxml import etree

def _from_xml(cls, source, context=None): 
    """
    Deserializes an xml string to an object of `cls` type.

    :param source: xml string
    :param context: pydantic validation context
    :return: deserialized object
    """

    parser = etree.XMLParser(huge_tree=True)
    return cls.from_xml_tree(etree.fromstring(source, parser), context=context)


BaseXmlModel.from_xml = classmethod(_from_xml)

It would be nice if huge_tree were exposed as part of the interface when running from_xml. Is this change desirable? If so, I'd be happy to write a PR. Maybe even something a bit more general to be able to pass arbitrary arguments to the XMLParser creation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions