Description
Hello,
I'm sorry that this is a long write-up, but I've tried to be as thorough as possible. The tl;dr is in the Summary section at the end.
It has been well-documented that sub-models don't inherit their namespaces from their parents and must be explicitly defined: #221
It has also been noted that child elements of a parent do inherit the namespaces of their parents. #197
However, what hasn't been touched on, it seems, is that inherited classes are propagating their namespaces to their parents. This has led to a problem, and then a secondary problem as a consequence.
The Setup
Consider a FooSchema.xsd
schema that defines two types under its default namespace: A BaseClass
, and a MiddleClass
that inherits from it, and adds an additional element:
FOO_NSMAP = {
"foo": "FooSchema.xsd",
}
class BaseClass(BaseXmlModel, nsmap=FOO_NSMAP, ns="foo"):
"""Base class of the FooSchema"""
base_element: str = element(tag="base_element")
class MiddleClass(BaseClass, nsmap=FOO_NSMAP, ns="foo"):
"""An inherited class of BaseClass in the FooSchema"""
middle_element: str = element(tag="middle_element")
middle_class = MiddleClass(
base_element="base",
middle_element="middle",
)
print(middle_class.to_xml())
#<foo:MiddleClass xmlns:foo="FooSchema.xsd">
# <foo:base_element>base</foo:base_element>
# <foo:middle_element>middle</foo:middle_element>
#</foo:MiddleClass>
# Consuming it with a default namespace works too!
middle_class = MiddleClass.from_xml("""
<MiddleClass xmlns="FooSchema.xsd">
<base_element>base</base_element>
<middle_element>middle</middle_element>
</MiddleClass>
""")
This all makes perfect sense, both classes and their elements are all under the FooSchema.xsd
namespace, and we know that base_element
and middle_element
are inheriting that namespace, as mentioned in 197 above.
Adding An Inherited Class with a Different Namespace
Now, consider a BarSchema.xsd
schema that has a type that inherits from MiddleClass
in FooSchema.xsd
, but does so under a different prefix:
BAR_NSMAP = {"foo": "FooSchema.xsd", "bar": "BarSchema.xsd"}
class BarTopClass(MiddleClass, nsmap=BAR_NSMAP, ns="bar"):
"""A class under BarSchema that inherits from FooSchema"""
top_element: str = element(tag="top_element")
bar_top_class = BarTopClass(
base_element="base",
middle_element="middle",
top_element="top",
)
print(bar_top_class.to_xml())
Now, printing this with to_xml()
, we might expect:
Expectation
The elements under FooSchema
would retain their specified, inherited, foo:
prefix, just like before and the TopClass
would get its bar
prefix:
<bar:BarTopClass xmlns:foo="FooSchema.xsd" xmlns:bar="BarSchema.xsd">
<foo:base_element>base</foo:base_element>
<foo:middle_element>middle</foo:middle_element>
<bar:top_element>top</bar:top_element>
</bar:BarTopClass>
The Problem
The Actual Result
But what actually happens is more confusing. Elements of the parent classes are being overwritten by the namespaces of their descendants, which is patently incorrect:
<bar:BarTopClass xmlns:foo="FooSchema.xsd" xmlns:bar="BarSchema.xsd">
<bar:base_element>base</bar:base_element>
<bar:middle_element>middle</bar:middle_element>
<bar:top_element>top</bar:top_element>
</bar:BarTopClass>
This should probably not happen, and seems very unexpected.
An attempted solution
Given what is mentioned in 197, one of the ways we might try to fix this is to explicitly declare namespaces on the elements of the parent classes, with ns="foo"
like so:
# FooSchema
class BaseClass(BaseXmlModel, nsmap=FOO_NSMAP, ns="foo"):
"""Base class of the FooSchema"""
base_element: str = element(tag="base_element", ns="foo")
class MiddleClass(BaseClass, nsmap=FOO_NSMAP, ns="foo"):
"""An inherited class of BaseClass in the FooSchema"""
middle_element: str = element(tag="middle_element", ns="foo")
And this does work! Trying print(bar_top_class.to_xml())
again, we now get:
<bar:BarTopClass xmlns:foo="FooSchema.xsd" xmlns:bar="BarSchema.xsd">
<foo:base_element>base</foo:base_element>
<foo:middle_element>middle</foo:middle_element>
<bar:top_element>top</bar:top_element>
</bar:BarTopClass>
until...
Problem 2 - A second inherited class
For whatever reason, the writers of BazSchema.xsd
have decided to do something silly and import MiddleClass and its schema FooSchema.xsd
under a different name:
BAZ_NSMAP = {"footoo": "FooSchema.xsd", "baz": "BazSchema.xsd"}
class BazTopClass(MiddleClass, nsmap=BAZ_NSMAP, ns="baz"):
"""A class under baz schema that also inherits from foo schema, but named it differently"""
top_baz_element: str = element(tag="top_baz_element")
baz_top_class = BazTopClass(
top_baz_element="baz_top",
middle_element="middle",
base_element="base",
)
print(baz_top_class.to_xml())
now, print(baz_top_class.to_xml())
gives us:
<baz:BazTopClass xmlns:footoo="FooSchema.xsd" xmlns:baz="BazSchema.xsd">
<base_element>base</base_element>
<middle_element>middle</middle_element>
<baz:top_baz_element>baz_top</baz:top_baz_element>
</baz:BazTopClass>
Now it gives no namespace at all? Surely, we can consume the XML if it's fully qualified though:
BazTopClass.from_xml("""
<baz:BazTopClass xmlns:footoo="FooSchema.xsd" xmlns:baz="BazSchema.xsd">
<footoo:base_element>base</footoo:base_element>
<footoo:middle_element>middle</footoo:middle_element>
<baz:top_baz_element>baz_top</baz:top_baz_element>
</baz:BazTopClass>
""")
pydantic_core._pydantic_core.ValidationError: 2 validation errors for BazTopClass
base_element
[line 2]: Field required [type=missing, input_value={'top_baz_element': 'baz_top'}, input_type=dict]
middle_element
[line 2]: Field required [type=missing, input_value={'top_baz_element': 'baz_top'}, input_type=dict]
Okay, well, if we used ns="foo"
on those parent classes, maybe we can use that namespace:
baz_top_class = BazTopClass.from_xml("""
<baz:BazTopClass xmlns:footoo="FooSchema.xsd" xmlns:baz="BazSchema.xsd">
<foo:middle_element>middle</foo:middle_element>
<foo:base_element>base</foo:base_element>
<baz:top_baz_element>baz_top</baz:top_baz_element>
</baz:BazTopClass>
""")
lxml.etree.XMLSyntaxError: Namespace prefix foo on middle_element is not defined, line 3, column 24
that error, unfortunately, makes sense. As a shot in the dark, I tried re-defining those parent classes, explicitly with the new BAZ_NSMAP prefixes:
class BazBaseClass(BaseClass, nsmap=BAZ_NSMAP, ns="footoo"):
pass
class BazMiddleClass(MiddleClass, nsmap=BAZ_NSMAP, ns="footoo"):
pass
class BazTopClass(BazMiddleClass, nsmap=BAZ_NSMAP, ns="baz", search_mode="unordered"):
"""A class under baz schema that also inherits from foo schema, but named it differently"""
top_baz_element: str = element(tag="top_baz_element", ns="baz")
but that unfortunately, still gives us the same:
<baz:BazTopClass xmlns:footoo="FooSchema.xsd" xmlns:baz="BazSchema.xsd">
<base_element>base</base_element>
<middle_element>middle</middle_element>
<baz:top_baz_element>baz_top</baz:top_baz_element>
</baz:BazTopClass>
In fact, the only way it seems I can get it to work, is to redefine every class and every single element that Baz inherits from, under the new namespace.
Summary
- Descendant classes are overwriting the namespaces of their parent class's elements unless the namespace is explicitly defined on every element of every parent class. This includes every class in the middle, all the way to the root class.
The apparent solution of explicitly declaring the namespace prefix on these classes leads to:
- Any class that inherits from the parents under a different namespace will lose that prefix entirely, and the only solution seems to be to redefine all parent classes.
This has some knock-on effects, as you might imagine. It also complicates scenarios like new schema versions, and schemas who define their default namespace as ""
, but inheriting schemas do define it under some prefix.
Problem 1 seems to be the real issue here, and it's hard to tell how closely the two are linked.
Thanks for taking the time to look at this.