Encoding#

Custom type encoding#

pydantic provides mechanisms to customize the default json encoding format. pydantic-xml uses custom encoders during the xml serialization too:

class File(BaseXmlModel):
    created: datetime = element()

    @field_serializer('created')
    def encode_created(self, value: datetime) -> float:
        return value.timestamp()

The following example illustrate how to encode bytes typed fields as Base64 string during the xml serialization:

model.py:

import base64
import pathlib
from typing import List, Optional, Union
from xml.etree.ElementTree import canonicalize

from pydantic import field_serializer, field_validator

from pydantic_xml import BaseXmlModel, RootXmlModel, attr, element


class File(BaseXmlModel):
    name: str = attr()
    content: bytes = element()

    @field_serializer('content')
    def encode_content(self, value: bytes) -> str:
        return base64.b64encode(value).decode()

    @field_validator('content', mode='before')
    def decode_content(cls, value: Optional[Union[str, bytes]]) -> Optional[bytes]:
        if isinstance(value, str):
            return base64.b64decode(value)

        return value


class Files(RootXmlModel, tag='files'):
    root: List[File] = element(tag='file', default=[])


files = Files()
for filename in ['./file1.txt', './file2.txt']:
    with open(filename, 'rb') as f:
        content = f.read()

    files.root.append(File(name=filename, content=content))

expected_xml_doc = pathlib.Path('./doc.xml').read_bytes()

assert canonicalize(files.to_xml(), strip_text=True) == canonicalize(expected_xml_doc, strip_text=True)

file1.txt:

hello world!!!

file2.txt:

¡Hola Mundo!

doc.xml:

<files>
  <file name="./file1.txt">
    <content>aGVsbG8gd29ybGQhISEK</content>
  </file>
  <file name="./file2.txt">
    <content>wqFIb2xhIE11bmRvIQo=</content>
  </file>
</files>

Optional type encoding#

Since the xml format doesn’t support null type natively it is not obvious how to encode None fields (ignore it, encode it as an empty string or mark it as xsi:nil).

None values are encoded as empty strings by default, but the library provides some alternative ways:

Define your own encoding format for None values:

from typing import Annotated, Optional, TypeVar
from xml.etree.ElementTree import canonicalize

from pydantic import BeforeValidator, PlainSerializer

from pydantic_xml import BaseXmlModel, element

InnerType = TypeVar('InnerType')
XmlOptional = Annotated[
    Optional[InnerType],
    PlainSerializer(lambda val: val if val is not None else 'null'),
    BeforeValidator(lambda val: val if val != 'null' else None),
]


class Company(BaseXmlModel):
    title: XmlOptional[str] = element(default=None)


xml_doc = '''
<Company>
    <title>null</title>
</Company>
'''

company = Company.from_xml(xml_doc)

assert company.title is None
assert canonicalize(company.to_xml(), strip_text=True) == canonicalize(xml_doc, strip_text=True)

Mark an empty elements as nillable:

from typing import Optional
from xml.etree.ElementTree import canonicalize

from pydantic_xml import BaseXmlModel, element


class Company(BaseXmlModel):
    title: Optional[str] = element(default=None, nillable=True)


xml_doc = '''
<Company>
    <title xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true" />
</Company>
'''

company = Company.from_xml(xml_doc)

assert company.title is None
assert canonicalize(company.to_xml(), strip_text=True) == canonicalize(xml_doc, strip_text=True)

Drop empty elements at all:

from typing import Optional
from pydantic_xml import BaseXmlModel, element

class Company(BaseXmlModel, skip_empty=True):
    title: Optional[str] = element(default=None)


company = Company()
assert company.to_xml() == b'<Company/>'

Empty entities exclusion#

It is possible to exclude all empty entities from the resulting xml document at once. To do that just pass skip_empty=True parameter to pydantic_xml.BaseXmlModel.to_xml() during the serialization. That parameter is applied to the root model and all its sub-models by default. But it can be adjusted for a particular model during its declaration as illustrated in the following example:

class Product(BaseXmlModel, tag='Product', skip_empty=True):
    status: Optional[Literal['running', 'development']] = attr(default=None)
    launched: Optional[int] = attr(default=None)
    title: Optional[str] = element(tag='Title', default=None)


class Company(BaseXmlModel, tag='Company'):
    trade_name: str = attr(name='trade-name')
    website: str = element(tag='WebSite', default='')

    products: Tuple[Product, ...] = element()


company = Company(
    trade_name="SpaceX",
    products=[
        Product(status="running", launched=2013, title="Several launch vehicles"),
        Product(status="running", title="Starlink"),
        Product(status="development"),
        Product(),
    ],
)

<Company trade-name="SpaceX">
    <WebSite /><!--Company empty elements are not excluded-->

    <!--Product empty sub-elements and attributes are excluded-->
    <Product status="running" launched="2013">
        <Title>Several launch vehicles</Title>
    </Product>
    <Product status="running">
        <Title>Starlink</Title>
    </Product>
    <Product status="development"/>
    <Product />
</Company>

Default namespace#

Xml default namespace is a namespace that is applied to the element and all its sub-elements without explicit definition.

In the following example the element company has no explicit namespace but the default namespace for that element and all its sub-elements is http://www.company.com/co. contacts element has no explicit namespace either but it doesn’t inherit it from company because it has its own default namespace. The same goes for socials element except that its sub-elements inherit a namespace from the parent:

<company xmlns="http://www.company.com/co">
    <contacts xmlns="http://www.company.com/cnt" >
        <socials xmlns="http://www.company.com/soc">
            <social>https://www.linkedin.com/company/spacex</social>
            <social>https://twitter.com/spacex</social>
            <social>https://www.youtube.com/spacex</social>
        </socials>
    </contacts>
</company>

A model for that document can be described like this:

class Socials(
    BaseXmlModel,
    tag='socials',
    nsmap={'': 'http://www.company.com/soc'},
):
    urls: List[str] = element(tag='social')


class Contacts(
    BaseXmlModel,
    tag='contacts',
    nsmap={'': 'http://www.company.com/cnt'},
):
    socials: Socials = element()


class Company(
    BaseXmlModel,
    tag='company',
    nsmap={'': 'http://www.company.com/co'},
):
    contacts: Contacts = element()

Look at the model’s parameter nsmap. To set a default namespace for a model and its sub-fields pass that namespace by an empty key.

Default namespace serialization

Standard libray xml serializer has a default namespace serialization problem: it doesn’t respect default namespaces definition moving namespaces definition to the root element substituting them with ns{0..} namespaces:

<ns0:company xmlns:ns0="http://www.company.com/co"
             xmlns:ns1="http://www.company.com/cnt"
             xmlns:ns2="http://www.company.com/soc">
    <ns1:contacts>
        <ns2:socials>
            <ns2:social>https://www.linkedin.com/company/spacex</ns2:social>
            <ns2:social>https://twitter.com/spacex</ns2:social>
            <ns2:social>https://www.youtube.com/spacex</ns2:social>
        </ns2:socials>
    </ns1:contacts>
</ns0:company>

That document is still correct but some parsers require namespace declaration kept untouched. To avoid that use lxml as a serialization backed since it doesn’t have that kind of problem. See lxml installation.

Computed entities#

pydantic supports computed fields. Computed fields allow property and cached_property to be included when serializing models or dataclasses. This is useful for fields that are computed from other fields, or for fields that are expensive to compute and should be cached.

pydantic-xml provides similar api for xml entities: text, attribute or element properties can be included into the xml document during serialization. To make a property computable decorate it with pydantic.computed_field to bind it to the current element, pydantic_xml.computed_attr() to bind it to an attribute or pydantic_xml.computed_element() to bind it to a sub-element.

The document:

doc.xml:

<Request Client="203.0.113.195">
  <Proxy>150.172.238.178</Proxy>
  <Proxy>150.172.230.21</Proxy>
  <Cookies PHPSESSID="298zf09hf012fh2" csrftoken="u32t4o3tb3gg43"/>
  <Authorization Type="Basic">**********</Authorization>
</Request>

produced by the following model:

model.py:

import pathlib
from ipaddress import IPv4Address
from typing import Dict, List, Tuple
from xml.etree.ElementTree import canonicalize

from pydantic import Field, IPvAnyAddress, SecretStr, computed_field, field_validator

from pydantic_xml import BaseXmlModel, attr, computed_attr, computed_element


class Auth(BaseXmlModel, tag='Authorization'):
    type: str = attr(name='Type')
    value: SecretStr


class Request(BaseXmlModel, tag='Request'):
    forwarded_for: List[IPv4Address] = Field(exclude=True)
    raw_cookies: str = Field(exclude=True)
    raw_auth: str = Field(exclude=True)

    @field_validator('forwarded_for', mode='before')
    def validate_address_list(cls, value: str) -> List[IPv4Address]:
        return [IPvAnyAddress(addr) for addr in value.split(',')]

    @computed_attr(name='Client')
    def client(self) -> IPv4Address:
        client, *proxies = self.forwarded_for
        return client

    @computed_element(tag='Proxy')
    def proxy(self) -> Tuple[IPv4Address]:
        client, *proxies = self.forwarded_for
        return proxies

    @computed_element(tag='Cookies')
    def cookies(self) -> Dict[str, str]:
        return dict(
            tuple(pair.split('=', maxsplit=1))
            for cookie in self.raw_cookies.split(';')
            if (pair := cookie.strip())
        )

    @computed_field
    def auth(self) -> Auth:
        auth_type, auth_value = self.raw_auth.split(maxsplit=1)
        return Auth(type=auth_type, value=auth_value)


request = Request(
    forwarded_for="203.0.113.195,150.172.238.178,150.172.230.21",
    raw_cookies="PHPSESSID=298zf09hf012fh2; csrftoken=u32t4o3tb3gg43;",
    raw_auth="Basic YWxhZGRpbjpvcGVuc2VzYW1l",
)

xml_doc = pathlib.Path('./doc.xml').read_text()
assert canonicalize(request.to_xml(), strip_text=True) == canonicalize(xml_doc, strip_text=True)

XML parser#

pydantic-xml tries to use the fastest xml parser in your system. It uses lxml if it is installed in your environment otherwise falls back to the standard library xml parser.

To force pydantic-xml to use standard xml.etree.ElementTree xml parser set FORCE_STD_XML environment variable.

XML serialization#

XML serialization process is customizable depending on which backend you use. For example lxml can pretty-print the output document or serialize it using a particular encoding (for more information see lxml.etree.tostring()). To set that parameters pass them to pydantic_xml.BaseXmlModel.to_xml() as extra arguments:

xml = obj.to_xml(
    pretty_print=True,
    encoding='UTF-8',
    standalone=True
)

print(xml)

Standard library serializer also supports customizations. For more information see xml.etree.ElementTree.tostring(),

Mypy#

pydantic-xml provides a mypy plugin that adds some important pydantic-specific features to type-check your code.

To enable the plugin add the following to your mypy.ini config file:

[mypy]
plugins = pydantic_xml.mypy

or pyproject.toml:

[tool.mypy]
plugins = [
  "pydantic_xml.mypy"
]