Encoding#

Custom type encoding#

pydantic provides mechanisms to customize the default json encoding format. pydantic-xml uses custom encoders during xml serialization too:

class File(BaseXmlModel):
    created: datetime = element()

    @field_serializer('created')
    def encode_created(self, value: datetime) -> float:
        return value.timestamp()

The following example illustrate how to encode bytes typed fields as Base64 string during xml serialization:

model.py:

import base64
import pathlib
from typing import List, Optional, Union
from xml.etree.ElementTree import canonicalize

from pydantic import field_serializer, field_validator

from pydantic_xml import BaseXmlModel, RootXmlModel, attr, element


class File(BaseXmlModel):
    name: str = attr()
    content: bytes = element()

    @field_serializer('content')
    def encode_content(self, value: bytes) -> str:
        return base64.b64encode(value).decode()

    @field_validator('content', mode='before')
    def decode_content(cls, value: Optional[Union[str, bytes]]) -> Optional[bytes]:
        if isinstance(value, str):
            return base64.b64decode(value)

        return value


class Files(RootXmlModel, tag='files'):
    root: List[File] = element(tag='file', default=[])


files = Files()
for filename in ['./file1.txt', './file2.txt']:
    with open(filename, 'rb') as f:
        content = f.read()

    files.root.append(File(name=filename, content=content))

expected_xml_doc = pathlib.Path('./doc.xml').read_bytes()

assert canonicalize(files.to_xml(), strip_text=True) == canonicalize(expected_xml_doc, strip_text=True)

file1.txt:

hello world!!!

file2.txt:

¡Hola Mundo!

doc.xml:

<files>
  <file name="./file1.txt">
    <content>aGVsbG8gd29ybGQhISEK</content>
  </file>
  <file name="./file2.txt">
    <content>wqFIb2xhIE11bmRvIQo=</content>
  </file>
</files>

None type encoding#

Since xml format doesn’t support null type natively it is not obvious how to encode None fields (ignore it, encode it as an empty string or mark it as xsi:nil) the library doesn’t implement None type encoding by default.

You can define your own encoding format:

from typing import Annotated, Optional, TypeVar
from xml.etree.ElementTree import canonicalize

from pydantic import BeforeValidator, PlainSerializer

from pydantic_xml import BaseXmlModel, element

InnerType = TypeVar('InnerType')
XmlOptional = Annotated[
    Optional[InnerType],
    PlainSerializer(lambda val: val if val is not None else 'null'),
    BeforeValidator(lambda val: val if val != 'null' else None),
]


class Company(BaseXmlModel):
    title: XmlOptional[str] = element(default=None)


xml_doc = '''
<Company>
    <title>null</title>
</Company>
'''

company = Company.from_xml(xml_doc)

assert company.title is None
assert canonicalize(company.to_xml(), strip_text=True) == canonicalize(xml_doc, strip_text=True)

or drop None fields at all:

from typing import Optional
from pydantic_xml import BaseXmlModel, element

class Company(BaseXmlModel):
    title: Optional[str] = element(default=None)


company = Company()
assert company.to_xml(skip_empty=True) == b'<Company/>'

Default namespace#

Xml default namespace is a namespace that is applied to the element and all its sub-elements without explicit definition.

In the following example the element company has no explicit namespace but the default namespace for that element and all its sub-elements is http://www.company.com/co. contacts element has no explicit namespace either but it doesn’t inherit it from company because it has its own default namespace. The same goes for socials element except that its sub-elements inherit a namespace from the parent:

<company xmlns="http://www.company.com/co">
    <contacts xmlns="http://www.company.com/cnt" >
        <socials xmlns="http://www.company.com/soc">
            <social>https://www.linkedin.com/company/spacex</social>
            <social>https://twitter.com/spacex</social>
            <social>https://www.youtube.com/spacex</social>
        </socials>
    </contacts>
</company>

A model for that document can be described like this:

class Socials(
    BaseXmlModel,
    tag='socials',
    nsmap={'': 'http://www.company.com/soc'},
):
    urls: List[str] = element(tag='social')


class Contacts(
    BaseXmlModel,
    tag='contacts',
    nsmap={'': 'http://www.company.com/cnt'},
):
    socials: Socials = element()


class Company(
    BaseXmlModel,
    tag='company',
    nsmap={'': 'http://www.company.com/co'},
):
    contacts: Contacts = element()

Look at the model’s parameter nsmap. To set a default namespace for a model and its sub-fields pass that namespace by an empty key.

Default namespace serialization

Standard libray xml serializer has a default namespace serialization problem: it doesn’t respect default namespaces definition moving namespaces definition to the root element substituting them with ns{0..} namespaces:

<ns0:company xmlns:ns0="http://www.company.com/co"
             xmlns:ns1="http://www.company.com/cnt"
             xmlns:ns2="http://www.company.com/soc">
    <ns1:contacts>
        <ns2:socials>
            <ns2:social>https://www.linkedin.com/company/spacex</ns2:social>
            <ns2:social>https://twitter.com/spacex</ns2:social>
            <ns2:social>https://www.youtube.com/spacex</ns2:social>
        </ns2:socials>
    </ns1:contacts>
</ns0:company>

That document is still correct but some parsers require namespace declaration kept untouched. To avoid that use lxml as a serialization backed since it doesn’t have that kind of problem. See lxml installation.

Computed entities#

pydantic supports computed fields. Computed fields allow property and cached_property to be included when serializing models or dataclasses. This is useful for fields that are computed from other fields, or for fields that are expensive to compute and should be cached.

pydantic-xml provides similar api for xml entities: text, attribute or element properties can be included into the xml document during serialization. To make a property computable decorate it with pydantic.computed_field to bind it to the current element, pydantic_xml.computed_attr() to bind it to an attribute or pydantic_xml.computed_element() to bind it to a sub-element.

The document:

doc.xml:

<Company Client="203.0.113.195">
  <Proxy>150.172.238.178</Proxy>
  <Proxy>150.172.230.21</Proxy>
  <Cookies PHPSESSID="298zf09hf012fh2" csrftoken="u32t4o3tb3gg43"/>
  <Authorization Type="Basic">**********</Authorization>
</Company>

produced by the following model:

model.py:

import pathlib
from ipaddress import IPv4Address
from typing import Dict, List
from xml.etree.ElementTree import canonicalize

from pydantic import Field, IPvAnyAddress, SecretStr, computed_field, field_validator

from pydantic_xml import BaseXmlModel, attr, computed_attr, computed_element


class Auth(BaseXmlModel, tag='Authorization'):
    type: str = attr(name='Type')
    value: SecretStr


class Request(BaseXmlModel, tag='Company'):
    forwarded_for: List[IPv4Address] = Field(exclude=True)
    raw_cookies: str = Field(exclude=True)
    raw_auth: str = Field(exclude=True)

    @field_validator('forwarded_for', mode='before')
    def validate_address_list(cls, value: str) -> List[IPv4Address]:
        return [IPvAnyAddress(addr) for addr in value.split(',')]

    @computed_attr(name='Client')
    def client(self) -> IPv4Address:
        client, *proxies = self.forwarded_for
        return IPvAnyAddress(client)

    @computed_element(tag='Proxy')
    def proxy(self) -> List[IPv4Address]:
        client, *proxies = self.forwarded_for
        return [IPvAnyAddress(addr) for addr in proxies]

    @computed_element(tag='Cookies')
    def cookies(self) -> Dict[str, str]:
        return dict(
            tuple(pair.split('=', maxsplit=1))
            for cookie in self.raw_cookies.split(';')
            if (pair := cookie.strip())
        )

    @computed_field
    def auth(self) -> Auth:
        auth_type, auth_value = self.raw_auth.split(maxsplit=1)
        return Auth(type=auth_type, value=auth_value)


request = Request(
    forwarded_for="203.0.113.195,150.172.238.178,150.172.230.21",
    raw_cookies="PHPSESSID=298zf09hf012fh2; csrftoken=u32t4o3tb3gg43;",
    raw_auth="Basic YWxhZGRpbjpvcGVuc2VzYW1l",
)

xml_doc = pathlib.Path('./doc.xml').read_text()
assert canonicalize(request.to_xml(), strip_text=True) == canonicalize(xml_doc, strip_text=True)

XML parser#

pydantic-xml tries to use the fastest xml parser in your system. It uses lxml if it is installed in your environment otherwise falls back to the standard library xml parser.

To force pydantic-xml to use standard xml.etree.ElementTree xml parser set FORCE_STD_XML environment variable.

XML serialization#

XML serialization process is customizable depending on which backend you use. For example lxml can pretty-print the output document or serialize it using a particular encoding (for more information see lxml.etree.tostring()). To set that features pass them to pydantic_xml.BaseXmlModel.to_xml()

xml = obj.to_xml(
    pretty_print=True,
    encoding='UTF-8',
    standalone=True
)

print(xml)

Standard library serializer also supports customizations. For more information see xml.etree.ElementTree.tostring(),