[Tutor] Pretty printing XML using LXML on Python3

Discussion:

2013-11-28 19:12:18 UTC

Hello,
I am using lxml with Python3, to generate xml code. "pretty_print" doesn't
seem to indent the generated lines.

I have installed the following lxml package:
/usr/local/lib/python3.2/dist-packages/lxml-3.2.4-py3.2-linux-x86_64.egg/lxml

The following is the example code I found on stack overflow, which runs
fine on Python2.7. But it concatenates all the lines if I run the same code
with Python3. Does anybody know what is the solution to make it work on
Python3?

from lxml import etree

# create XML
root = etree.Element('root')
root.append(etree.Element('child'))
# another child with text
child = etree.Element('child')
child.text = 'some text'
root.append(child)

# pretty string
s = etree.tostring(root, pretty_print=True)
print(s)

Run with Python:

$ python testx.py
<root>
<child/>
<child>some text</child>
</root>
$

Run with Python3:

$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'
$

Thanks in advance.
-SM

eryksun

2013-11-28 19:45:59 UTC

Permalink

Post by SM
$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'

print() first gets the object as a string. tostring() returns bytes,
and bytes.__str__ returns the same as bytes.__repr__. You can decode

Post by SM

s = etree.tounicode(root, pretty_print=True)
print(s)

2013-11-29 21:21:22 UTC

Permalink

Thank you, eryksun. using tounicode seems to work on this small piece of
code. It still has issues with my code which is generating a big XML code.
I will figure out why.
-SM

Post by eryksun

Post by SM
$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'

print() first gets the object as a string. tostring() returns bytes,
and bytes.__str__ returns the same as bytes.__repr__. You can decode

Post by SM

s = etree.tounicode(root, pretty_print=True)
print(s)

Stefan Behnel

2013-11-30 09:04:38 UTC

Permalink

Post by SM

Post by eryksun

Post by SM
$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'

print() first gets the object as a string. tostring() returns bytes,
and bytes.__str__ returns the same as bytes.__repr__.

Meaning, it's a pure matter of visual representation on the screen, not a
difference in the data.

Post by SM

Post by eryksun
You can decode

Post by SM

s = etree.tounicode(root, pretty_print=True)
print(s)

Thank you, eryksun. using tounicode seems to work on this small piece of
code. It still has issues with my code which is generating a big XML code.

Well, I'm sure you are not generating a large chunk of XML just to print it
on the screen, so using tostring(), as you did before, is certainly better.

However, if it's really that much output, you should serialise into a file
instead of serialising it into memory first and then writing that into a
file. So, use ElementTree.write() to write the output into a file directly.

Stefan