XML: ElementTree (`etree`)¶

SAX and DOM ¶

SAX

Event-driven (elements start and end)
Commonly used to parse long streams of structured data
“De-facto” standard
Available in multiple languages
Python: xml.sax

DOM: “Document Object Model”

Document available as a tree
Programmatically navigable as a tree
Relatively comfortable
Python: xml.dom
Problems
- Only relatively comfortable
- Not Pythonic enough

ElementTree ¶

xml.etree: Python specific ⟶ absolutely: comfortable

Seamless integration in Python (⟶ iteration)
A document is a tree, and trees are lists of lists
XML attributes represented as dictionaries

⟶ simple!

A Very Simple Document ¶

Python code¶

from xml.etree.ElementTree import Element
element = Element("root")
child = Element("child")
element.append(child)

Or alternatively …¶

element = Element("root")
SubElement(element, "child")

XML¶

<root>
  <child />
</root>

Attributes ¶

XML elements have attributes
Python’s XML elements have the attrib dictionary

element = Element("root")
child = SubElement(element, "child")
child.attrib['age'] = '15'
child = SubElement(element, "child")
child.attrib['age'] = '17'

<root>
  <child age="15" />
  <child age="17" />
</root>

Text (1)¶

In XML documents, free text is permitted …

Inside one element
After one element, but before the start of another element

Accordingly, Python elements have members …

element.text
element.tail
No text ⟶ None

Text (2)¶

element = Element("root")
child = SubElement(element, "child")
child.text = 'Text'
child.tail = 'Tail'

<root><child>Text</child>Tail</root>

Careful with indentation

Whitespace, linefeed etc. is text, no matter what
str.strip() may be helpful

Writing XML Documents ¶

We have created Element objects
Added child elements
Now how do we create XML?
Wrap into ElementTree - a helper

from xml.etree.ElementTree import ElementTree
tree = ElementTree(element)
tree.write(sys.stdout) # oder file(..., 'w')

Output is very tight
Text is preserved as-is
Pretty output would be incorrect
- Linefeed and indentation is text

Reading XML Documents ¶

This is simple …¶

from xml.etree.ElementTree import parse

tree = parse(sys.stdin)
for child in tree.getroot():
    age = child.attrib.get('age')
    if age is not None:
        print age
    if child.text is not None:
        print child.text

XML: ElementTree (etree)¶

XML: ElementTree (`etree`)¶