Custom type encoding#

pydantic-xml uses pydantic default encoder to encode fields data during xml serialization. To alter the default behaviour pydantic provides a mechanism to customize the default json encoding format for a particular type. pydantic-xml allows to do the same for xml serialization. The api is similar to the json one:

class Model:
    class Config:
        xml_encoders = {
            bytes: base64.b64encode,
        }
    ...

The following example illustrate how to encode bytes typed fields as Base64 string during xml serialization:

model.py:

import base64
import pathlib
from typing import List, Optional, Union
from xml.etree.ElementTree import canonicalize

import pydantic

from pydantic_xml import BaseXmlModel, attr, element


class File(BaseXmlModel):
    name: str = attr()
    content: bytes = element()

    @pydantic.validator('content', pre=True)
    def decode_content(cls, value: Optional[Union[str, bytes]]) -> Optional[bytes]:
        if isinstance(value, str):
            return base64.b64decode(value)

        return value


class Files(BaseXmlModel, tag='files'):
    class Config:
        xml_encoders = {
            bytes: lambda value: base64.b64encode(value).decode(),
        }

    __root__: List[File] = element(tag='file', default=[])


files = Files()
for filename in ['./file1.txt', './file2.txt']:
    with open(filename, 'rb') as f:
        content = f.read()

    files.__root__.append(File(name=filename, content=content))

expected_xml_doc = pathlib.Path('./doc.xml').read_bytes()

assert canonicalize(files.to_xml(), strip_text=True) == canonicalize(expected_xml_doc, strip_text=True)

file1.txt:

hello world!!!

file2.txt:

¡Hola Mundo!

doc.xml:

<files>
  <file name="./file1.txt">
    <content>aGVsbG8gd29ybGQhISEK</content>
  </file>
  <file name="./file2.txt">
    <content>wqFIb2xhIE11bmRvIQo=</content>
  </file>
</files>

Default namespace#

Xml default namespace is a namespace that is applied to the element and all its sub-elements without explicit definition.

In the following example the element company has no explicit namespace but the default namespace for that element and all its sub-elements is http://www.company.com/co. contacts element has no explicit namespace either but it doesn’t inherit it from company because it has its own default namespace. The same goes for socials element except that its sub-elements inherit a namespace from the parent:

<company xmlns="http://www.company.com/co">
    <contacts xmlns="http://www.company.com/cnt" >
        <socials xmlns="http://www.company.com/soc">
            <social>https://www.linkedin.com/company/spacex</social>
            <social>https://twitter.com/spacex</social>
            <social>https://www.youtube.com/spacex</social>
        </socials>
    </contacts>
</company>

A model for that document can be described like this:

class Socials(
    BaseXmlModel,
    tag='socials',
    nsmap={'': 'http://www.company.com/soc'},
):
    urls: List[str] = element(tag='social')


class Contacts(
    BaseXmlModel,
    tag='contacts',
    nsmap={'': 'http://www.company.com/cnt'},
):
    socials: Socials = element()


class Company(
    BaseXmlModel,
    tag='company',
    nsmap={'': 'http://www.company.com/co'},
):
    contacts: Contacts = element()

Look at the model’s parameter nsmap. To set a default namespace for a model and its sub-fields pass that namespace by an empty key.

Default namespace serialization

Standard libray xml serializer has a default namespace serialization problem: it doesn’t respect default namespaces definition moving namespaces definition to the root element substituting them with ns{0..} namespaces:

<ns0:company xmlns:ns0="http://www.company.com/co"
             xmlns:ns1="http://www.company.com/cnt"
             xmlns:ns2="http://www.company.com/soc">
    <ns1:contacts>
        <ns2:socials>
            <ns2:social>https://www.linkedin.com/company/spacex</ns2:social>
            <ns2:social>https://twitter.com/spacex</ns2:social>
            <ns2:social>https://www.youtube.com/spacex</ns2:social>
        </ns2:socials>
    </ns1:contacts>
</ns0:company>

That document is still correct but some parsers require namespace declaration kept untouched. To avoid that use lxml a as serializer backed since it doesn’t have that kind of problem.

XML parser#

pydantic-xml tries to use the fastest xml parser in your system. It uses lxml if it is installed in your environment otherwise falls back to the standard library xml parser.

To force pydantic-xml to use standard xml.etree.ElementTree xml parser set FORCE_STD_XML environment variable.

XML serialization#

XML serialization process is customizable depending on which backend you use. For example lxml can pretty-print the output document or serialize it using a particular encoding (for more information see lxml.etree.tostring()). To set that features pass them to pydantic_xml.BaseXmlModel.to_xml()

xml = obj.to_xml(
    pretty_print=True,
    encoding='UTF-8',
    standalone=True
)

print(xml)

Standard library serializer also supports customizations. For more information see xml.etree.ElementTree.tostring(),