Discussion:
[Tutor] Searching through files for values
Jason Brown
2015-08-13 23:48:52 UTC
Permalink
Hi,

I'm trying to search for list values in a set of files. The goal is to
generate a list of lists that can later be sorted. I can only get a match
on the first value in the list:

contents of value_file:
value1
value2
value3
...

The desired output is:

file1 value1
file1 value2
file2 value3
file3 value1
...

Bit it's only matching on the first item in vals, so the result is:

file1 value1
file3 value1

The subsequent values are not searched.

filenames = [list populated with filenames in a dir tree]
vals = []
value_file = open(vars)
for i in value_file:
vals.append(i.strip())
value_file.close()

for file_list in filenames:
with open(file_list) as files:
for items in vals:
for line in files:
if items in line:
print file_list, line



for line in vals:
print line

returns:
['value1', 'value2', 'value3']

print filenames

returns:
['file1', 'file2', 'file3']


Any help would be greatly appreciated.

Thanks,

Jason
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Cameron Simpson
2015-08-14 02:32:03 UTC
Permalink
Post by Jason Brown
I'm trying to search for list values in a set of files. The goal is to
generate a list of lists that can later be sorted. I can only get a match
value1
value2
value3
...
file1 value1
file1 value2
file2 value3
file3 value1
...
file1 value1
file3 value1
The subsequent values are not searched.
filenames = [list populated with filenames in a dir tree]
vals = []
value_file = open(vars)
vals.append(i.strip())
value_file.close()
You close value_file inside the loop i.e. immediately after the first value.
Because the file is closed, the loop iteration stops. You need to close it
outside the loop (after all the values have been loaded):

value_file = open(vars)
for i in value_file:
vals.append(i.strip())
value_file.close()

It is worth noting that a better way to write this is:

with open(vars) as value_file:
for i in value_file:
vals.append(i.strip())

Notice that there is no .close(). The "with" construct is the pynthon syntax to
use a context manager, and "open(vars)" returns an open file, which is also a
context manager. A context manager has enter and exit actions which fire
unconditionally at the start and end of the "with", even if the with is exited
with an exception or a control like "return" or "break".

The benefit of this is after the "with", the file will _always" get closed. It
is also shorter and easier to read.
Post by Jason Brown
print file_list, line
I would remark that "file_list" is not a great variable name. Many people would
read it as implying that its value is a list. Personally I would have just
called it "filename", the singular of your "filenames".

Cheers,
Cameron Simpson <***@zip.com.au>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Jason Brown
2015-08-14 04:07:30 UTC
Permalink
(accidentally replied directly to Cameron)

Thanks, Cameron. It looks like that value_file.close() tab was
accidentally tabbed when I pasted the code here. Thanks for the suggestion
for using 'with' though! That's will be handy.

To test, I tried manually specifying the list:

vals = [ 'value1', 'value2', 'value3' ]

And I still get the same issue. Only the first value in the list is looked
up.

Jason
Post by Jason Brown
Post by Jason Brown
I'm trying to search for list values in a set of files. The goal is to
generate a list of lists that can later be sorted. I can only get a match
value1
value2
value3
...
file1 value1
file1 value2
file2 value3
file3 value1
...
file1 value1
file3 value1
The subsequent values are not searched.
filenames = [list populated with filenames in a dir tree]
Post by Jason Brown
vals = []
value_file = open(vars)
vals.append(i.strip())
value_file.close()
You close value_file inside the loop i.e. immediately after the first
value. Because the file is closed, the loop iteration stops. You need to
close it
value_file = open(vars)
vals.append(i.strip())
value_file.close()
vals.append(i.strip())
Notice that there is no .close(). The "with" construct is the pynthon
syntax to use a context manager, and "open(vars)" returns an open file,
which is also a context manager. A context manager has enter and exit
actions which fire unconditionally at the start and end of the "with", even
if the with is exited with an exception or a control like "return" or
"break".
The benefit of this is after the "with", the file will _always" get
closed. It is also shorter and easier to read.
Post by Jason Brown
print file_list, line
I would remark that "file_list" is not a great variable name. Many people
would read it as implying that its value is a list. Personally I would have
just called it "filename", the singular of your "filenames".
Cheers,
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-08-14 09:07:52 UTC
Permalink
Post by Jason Brown
(accidentally replied directly to Cameron)
Thanks, Cameron. It looks like that value_file.close() tab was
accidentally tabbed when I pasted the code here. Thanks for the suggestion
for using 'with' though! That's will be handy.
vals = [ 'value1', 'value2', 'value3' ]
And I still get the same issue. Only the first value in the list is
looked up.
print file_list, line
I'll change it to some meaningful names:

with open(filename) as infile:
for search_value in vals:
for line in infile:
if search_value in line:
print filename, "has", search_value, "in line", line.strip()

You open infile once and then iterate over its lines many times, once for
every search_value. But unlike a list of lines you can only iterate once
over a file:

$ cat values.txt
alpha
beta
gamma
$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Post by Jason Brown
lines = open("values.txt")
for line in lines: print line.strip()
...
alpha
beta
gamma
Post by Jason Brown
for line in lines: print line.strip()
...
No output in the second loop. The file object remembers the current position
and starts its iteration there. Unfortunately you have already reached the
end, so there are no more lines. Possible fixes:

(1) Open a new file object for every value:

for filename in filenames:
for search_value in vals:
with open(filename) as infile:
for line in infile:
if search_value in line:
print filename, "has", search_value,
print "in line", line.strip()

(2) Use seek() to reset the position of the file pointer:

for filename in filenames:
with open(filename) as infile:
for search_value in vals:
infile.seek(0)
for line in infile:
if search_value in line:
print filename, "has", search_value,
print "in line", line.strip()

(3) If the file is small or not seekable (think stdin) read its contents in
a list and iterate over that:

for filename in filenames:
with open(filename) as infile:
lines = infile.readlines()
for search_value in vals:
for line in lines:
if search_value in line:
print filename, "has", search_value,
print "in line", line.strip()

(4) Adapt your algorithm to test all search values against a line before you
proceed to the next line. This will change the order in which the matches
are printed, but will work with both stdin and huge files that don't fit
into memory. I'll leave the implementation to you as an exercise ;)


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-08-14 10:59:34 UTC
Permalink
Others have commented on your choice of names.
I'll add one small general point.
Try to match the plurality of your names to the
nature of the object. Thus if it is a collection
of items use a plural name.

If it is a single object use a single name.

This has the effect that for loops would
normally look like:

for <single name> in <plural name>:

This makes no difference to python but it makes it a lot
easier for human readers - including you - to comprehend
what is going on and potentially spot errors.

Also your choice of file_list suggests it is a list object
but in fact it's not, its' a single file, so simply reversing
the name to list_file makes it clearer what the nature of
the object is (although see below re using type names).

Applying that to the snippet above it becomes:

for list_file in filenames:
with open(list_file) as file:
for item in vals:
for line in file:

The final principle, is that you should try to name variable
after their purpose rather than their type. ie. describe the
content of the data not its type.

Using that principle file might be better named as data
or similar - better still what kind of data (dates,
widgets, names etc), but you don't tell us that...

And of course principles are just that. There will be cases
where ignoring them makes sense too.

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Continue reading on narkive:
Loading...