Discussion:
[Tutor] Concatenating columns via python
Hannah G. McDonald
2015-07-28 18:52:59 UTC
Permalink
I extracted a table from a PDF so the data is quite messy and the data that should be in 1 row is in 3 colums, like so:
year color location
1 1997 blue, MD
2 green,
3 and yellow

SO far my code is below, but I know I am missing data I am just not sure what to put in it:

# Simply read and split an example Table 4
import sys

# Assigning count number and getting rid of right space
def main():
count = 0
pieces = []
for line in open(infile, 'U'):
if count < 130:
data = line.replace('"', '').rstrip().split("\t")
data = clean_data(data)
if data[1] == "year" and data[1] != "":
write_pieces(pieces)
pieces = data
str.join(pieces)
else:
for i in range(len(data)):
pieces[i] = pieces[i] + data[i]
str.join(pieces)

# Executing command to remove right space
def clean_data(s):
return [x.rstrip() for x in s]

def write_pieces(pieces):
print

if __name__ == '__main__':
infile = "file.txt"
main()

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-07-29 00:01:07 UTC
Permalink
Post by Hannah G. McDonald
I extracted a table from a PDF so the data is quite messy
year color location
1 1997 blue, MD
2 green,
3 and yellow
Please post in plain text. Your code has got mangled and
lost the indentation soi I'll need to guess...

Also tell us the Python(and OS) version, it all helps.

So far as the sample data you provided it doesn't seem to bear much
relation to the code below apoart from (maybe) the hreader line.

DFor example what are you planning on doing with the 'and' in the 3rd
line? There seems to be no attempt to process that?
And how can you add the strings meaningfully?

In other words can you show both the input *and the output*
you are aiming for?
Post by Hannah G. McDonald
# Simply read and split an example Table 4
import sys
# Assigning count number and getting rid of right space
count = 0
pieces = []
data = line.replace('"', '').rstrip().split("\t")
data = clean_data(data)
For which I guess:

def main():
count = 0
pieces = []
for line in open(infile, 'U'):
if count < 130:
data = line.replace('"', '').rstrip().split("\t")
data = clean_data(data)
This doesn't make sense since if data[1] is 'year'
it can never be "" so the second test is redundant.
And it should only ever be true on the header line.
Post by Hannah G. McDonald
write_pieces(pieces)
pieces = data
str.join(pieces)
When you do the write_pieces() call pieces is
an empty list?

Then you try to join it using str.join but that is
the class method so expects a string instance as
its first argument. I suspect you should have used:

" ".join(pieces)

or

"\t".join(pieces)

But I'm not certain what you plan on doing here.
Especially since you don;t assign the result to
any variable so it gets deleted.
Post by Hannah G. McDonald
pieces[i] = pieces[i] + data[i]
str.join(pieces)
Since pieces is potentially the empty list here
you cannot safely assign anything to pieces[i].
And again I don;t know what the last line is
supposed to be doing.
Post by Hannah G. McDonald
# Executing command to remove right space
return [x.rstrip() for x in s]
print
This makes no sense since it only prints a blank line...
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Loading...