Discussion:
[Tutor] parsing xml as lines
richard kappler
2015-11-04 18:36:07 UTC
Permalink
I have an xml file that get's written to as events occur. Each event writes
a new 'line' of xml to the file, in a specific format, eg: sometthing like
this:

<heresmydataline xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Logging.xsd" version="1.0"><child of
heresmydata/><anotherchildofheresmydata/><grandchild>somestuff</grandchild></heresmydata>

and each 'line' has that same structure or format.

I've written a script that parses out the needed data and forwards it on
using regex's, but think it might be better to use an xml parser. I can
parse out what I need to if I have just one line in the file, but when
there are number of lines as there actually are, I can't figure out how to
get it to work.

In other words, with a one line file, this works fine and I understand it:

import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='1lineTest.log'
grandchild = tree.find('grandchild')
print grandchild.tag, grandchild.text

and I get the output I desire:

grandchild Sally

But if I have several lines in the file try to run a loop:

import xml.etree.cElementTree as ET
f1 = open('5lineTest.log', 'r')
lineList = f1.readlines()
Imax = len(lineList)

i = 0
while i <= Imax:
tree = ET.ElementTree(lineList[i])
grandchild = tree.find('grandchild')
print grandchild.tag, grandchild.txt
i += 1

Traceback (most recent call last):
File "<stdin>", line 4, in <module>
AttributeError: 'int' object has no attribute 'tag'

and yet I can do:

print lineList[0] and it will print out the first line.

I get why (I think), I just can't figure out a way around it.

Guidance please?
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-11-04 19:41:27 UTC
Permalink
Post by richard kappler
I have an xml file that get's written to as events occur. Each event
<heresmydataline xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Logging.xsd" version="1.0"><child of
heresmydata/><anotherchildofheresmydata/><grandchild>somestuff</grandchild></heresmydata>
Post by richard kappler
and each 'line' has that same structure or format.
I've written a script that parses out the needed data and forwards it on
using regex's, but think it might be better to use an xml parser. I can
parse out what I need to if I have just one line in the file, but when
there are number of lines as there actually are, I can't figure out how to
get it to work.
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='1lineTest.log'
grandchild = tree.find('grandchild')
print grandchild.tag, grandchild.text
grandchild Sally
import xml.etree.cElementTree as ET
f1 = open('5lineTest.log', 'r')
lineList = f1.readlines()
Imax = len(lineList)
i = 0
tree = ET.ElementTree(lineList[i])
grandchild = tree.find('grandchild')
print grandchild.tag, grandchild.txt
i += 1
File "<stdin>", line 4, in <module>
AttributeError: 'int' object has no attribute 'tag'
print lineList[0] and it will print out the first line.
I get why (I think), I just can't figure out a way around it.
Guidance please?
Ceterum censo ;) Abandon the notion of lines!

To process nodes as they arrive from the parser have a look at iterparse:

http://effbot.org/zone/element-iterparse.htm
https://docs.python.org/dev/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Continue reading on narkive:
Loading...