Discussion:
[Tutor] Pretty printing XML using LXML on Python3
SM
2013-11-28 19:12:18 UTC
Permalink
Hello,
I am using lxml with Python3, to generate xml code. "pretty_print" doesn't
seem to indent the generated lines.

I have installed the following lxml package:
/usr/local/lib/python3.2/dist-packages/lxml-3.2.4-py3.2-linux-x86_64.egg/lxml

The following is the example code I found on stack overflow, which runs
fine on Python2.7. But it concatenates all the lines if I run the same code
with Python3. Does anybody know what is the solution to make it work on
Python3?

from lxml import etree

# create XML
root = etree.Element('root')
root.append(etree.Element('child'))
# another child with text
child = etree.Element('child')
child.text = 'some text'
root.append(child)

# pretty string
s = etree.tostring(root, pretty_print=True)
print(s)

Run with Python:

$ python testx.py
<root>
<child/>
<child>some text</child>
</root>
$

Run with Python3:

$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'
$

Thanks in advance.
-SM
eryksun
2013-11-28 19:45:59 UTC
Permalink
Post by SM
$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'
print() first gets the object as a string. tostring() returns bytes,
and bytes.__str__ returns the same as bytes.__repr__. You can decode
Post by SM
s = etree.tounicode(root, pretty_print=True)
print(s)
<root>
<child/>
<child>some text</child>
</root>
SM
2013-11-29 21:21:22 UTC
Permalink
Thank you, eryksun. using tounicode seems to work on this small piece of
code. It still has issues with my code which is generating a big XML code.
I will figure out why.
-SM
Post by eryksun
Post by SM
$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'
print() first gets the object as a string. tostring() returns bytes,
and bytes.__str__ returns the same as bytes.__repr__. You can decode
Post by SM
s = etree.tounicode(root, pretty_print=True)
print(s)
<root>
<child/>
<child>some text</child>
</root>
Stefan Behnel
2013-11-30 09:04:38 UTC
Permalink
Post by SM
Post by eryksun
Post by SM
$ python3 testx.py
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'
print() first gets the object as a string. tostring() returns bytes,
and bytes.__str__ returns the same as bytes.__repr__.
Meaning, it's a pure matter of visual representation on the screen, not a
difference in the data.
Post by SM
Post by eryksun
You can decode
Post by SM
s = etree.tounicode(root, pretty_print=True)
print(s)
<root>
<child/>
<child>some text</child>
</root>
Thank you, eryksun. using tounicode seems to work on this small piece of
code. It still has issues with my code which is generating a big XML code.
Well, I'm sure you are not generating a large chunk of XML just to print it
on the screen, so using tostring(), as you did before, is certainly better.

However, if it's really that much output, you should serialise into a file
instead of serialising it into memory first and then writing that into a
file. So, use ElementTree.write() to write the output into a file directly.

Stefan

Loading...