XML: ElementTree (etree
)#
SAX and DOM#
SAX
|
DOM: “Document Object Model”
|
ElementTree#
xml.etree
: Python specific ⟶ absolutelycomfortable
Seamless integration in Python (⟶ iteration)
A document is a tree, and trees are lists of lists
XML attributes represented as dictionaries
⟶ simple!
A Very Simple Document#
from xml.etree.ElementTree import Element
element = Element("root")
child = Element("child")
element.append(child)
element = Element("root")
SubElement(element, "child")
<root>
<child />
</root>
Attributes#
XML elements have attributes
Python’s XML elements have the
attrib
dictionary
element = Element("root")
child = SubElement(element, "child")
child.attrib['age'] = '15'
child = SubElement(element, "child")
child.attrib['age'] = '17'
<root>
<child age="15" />
<child age="17" />
</root>
Text (1)#
In XML documents, free text is permitted …
Inside one element
After one element, but before the start of another element
Accordingly, Python elements have members …
element.text
element.tail
No text ⟶
None
Text (2)#
element = Element("root")
child = SubElement(element, "child")
child.text = 'Text'
child.tail = 'Tail'
<root><child>Text</child>Tail</root>
Careful with indentation
Whitespace, linefeed etc. is text, no matter what
str.strip()
may be helpful
Writing XML Documents#
We have created
Element
objectsAdded child elements
Now how do we create XML?
Wrap into
ElementTree
- a helper
from xml.etree.ElementTree import ElementTree
tree = ElementTree(element)
tree.write(sys.stdout) # oder file(..., 'w')
Output is very tight
Text is preserved as-is
Pretty output would be incorrect
Linefeed and indentation is text
Reading XML Documents#
from xml.etree.ElementTree import parse
tree = parse(sys.stdin)
for child in tree.getroot():
age = child.attrib.get('age')
if age is not None:
print age
if child.text is not None:
print child.text