Discussion:
[Tutor] reading lines from a list of files
Alex Kleider
2015-05-12 03:36:15 UTC
Permalink
Is there a better (more 'Pythonic') way to do the following?

for f_name in f_names:
with open(f_name, 'r') as f:
for line in f:

As always, thank you, tutors, for all you are doing.

AlexK
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-05-12 06:48:58 UTC
Permalink
Post by Alex Kleider
Is there a better (more 'Pythonic') way to do the following?
There's the fileinput module

<https://docs.python.org/dev/library/fileinput.html#fileinput.input>

but personally I prefer the way you show above.

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alex Kleider
2015-05-12 09:33:20 UTC
Permalink
Post by Peter Otten
Post by Alex Kleider
Is there a better (more 'Pythonic') way to do the following?
There's the fileinput module
<https://docs.python.org/dev/library/fileinput.html#fileinput.input>
but personally I prefer the way you show above.
Then I'll stick with what you prefer and what I know.
It seems silly to import yet another module for the sole
purpose of saving one line of code although the reason
for my inquiry was more to diminish levels of indentation
than number of lines.
Thanks,
Alex
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-05-12 10:46:32 UTC
Permalink
Post by Alex Kleider
Post by Peter Otten
Post by Alex Kleider
Is there a better (more 'Pythonic') way to do the following?
There's the fileinput module
<https://docs.python.org/dev/library/fileinput.html#fileinput.input>
but personally I prefer the way you show above.
Then I'll stick with what you prefer and what I know.
It seems silly to import yet another module for the sole
purpose of saving one line of code
I think of importing a module as "cheap" unless it draws in a framework (e.
g. numpy). And don't forget that even small pieces of code should be tested.
So you aren't just saving the extra line, but also some of your tests.
Post by Alex Kleider
although the reason
for my inquiry was more to diminish levels of indentation
than number of lines.
You usually do that by factoring out the loops into a generator:

def lines(files):
for file in files:
with open(files) as f:
yield from f # before python 3.3: for line in f: yield line


for line in lines(files):
...


Also possible, but sloppy as files are closed on garbage collection rather
than explicitly:

lines = (line for file in files for line in open(file))
for line in lines:
...


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Oscar Benjamin
2015-05-12 15:42:47 UTC
Permalink
Post by Peter Otten
Post by Alex Kleider
although the reason
for my inquiry was more to diminish levels of indentation
than number of lines.
yield from f # before python 3.3: for line in f: yield line
...
Also possible, but sloppy as files are closed on garbage collection rather
lines = (line for file in files for line in open(file))
...
The lines generator function above relies on __del__ as well if the
loop exits from break, return or an exception in the loop body. This
happens whenever you yield or yield-from from inside a with block. The
chain of events that calls your context manager if I break from the
loop is:

1) Once the for loop discards the generator it has a zero reference count.
2) generator.__del__() called.
3) __del__() calls close()
4) close() throws GeneratorExit into the generator frame.
5) The GeneratorExit triggers the __exit__() methods of any context
managers active in the generator frame.

Try the following:

$ cat test_gen_cm.py
#!/usr/bin/env python3

class cleanup():
def __enter__(self):
pass
def __exit__(self, *args):
print("__exit__ called")__del__.

def generator_with_cm():
with cleanup():
yield 1
yield 2
yield 3

g = generator_with_cm()
for x in g:
break

print('Deleting g')
del g
print('g is now deleted')

$ ./test_gen_cm.py
Deleting g
__exit__ called
g is now deleted

A generator cannot guarantee that execution continues after a yield so
any context manager used around a yield is dependent on __del__. I
think a good rule of thumb is "don't yield from a with block".

Alex I apologise if what I've written here is confusing but really
what you started with is just fine. It is not important to fully
understand what I wrote above.


--
Oscar
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-05-12 18:08:17 UTC
Permalink
Post by Oscar Benjamin
A generator cannot guarantee that execution continues after a yield so
any context manager used around a yield is dependent on __del__. I
think a good rule of thumb is "don't yield from a with block".
Uh-oh, I am afraid I did this quite a few times. Most instances seem to be
context managers though. Is something like

@contextmanager
def my_open(filename):
if filename == "-":
yield sys.stdin
else:
with open(filename) as f:
yield f


OK?


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Oscar Benjamin
2015-05-13 13:20:50 UTC
Permalink
Post by Peter Otten
Post by Oscar Benjamin
A generator cannot guarantee that execution continues after a yield so
any context manager used around a yield is dependent on __del__. I
think a good rule of thumb is "don't yield from a with block".
Uh-oh, I am afraid I did this quite a few times. Most instances seem to be
context managers though. Is something like
@contextmanager
yield sys.stdin
yield f
OK?
Yeah that's fine. A generator cannot guarantee that execution
continues after a yield since the controller of the generator decides
that. In this case the only controller that has access to your
generator is the contextmanager decorator which guarantees to do
next(gen) or gen.throw().

You can see the code for that here:
https://hg.python.org/cpython/file/4b5461dcd190/Lib/contextlib.py#l63


--
Oscar
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
David Rock
2015-05-12 16:12:04 UTC
Permalink
Post by Alex Kleider
Post by Peter Otten
Post by Alex Kleider
Is there a better (more 'Pythonic') way to do the following?
There's the fileinput module
<https://docs.python.org/dev/library/fileinput.html#fileinput.input>
but personally I prefer the way you show above.
Then I'll stick with what you prefer and what I know.
It seems silly to import yet another module for the sole
purpose of saving one line of code although the reason
for my inquiry was more to diminish levels of indentation
than number of lines.
Thanks,
Alex
Personally, *I* prefer fileinput as my go-to file reader.

Don't knock fileinput for "saving one line." It does a lot more than
that. It allows your script to manage the filelist as an input,
automatically handles stdin so your script can easily be both a filter
in a pipeline and a file reader, plus a host of other useful methods for
info about the file you are reading.

Part of what you really need to define is the context of your question
of "better." What is your use case? From where is your list of files
coming? Is it truly just "read and forget"? Your needs will dictate
what option is "best." It may be what you've already done yourself, it
may be fileinput, or it may be something completely different.
--
David Rock
***@graniteweb.com
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alex Kleider
2015-05-14 05:27:11 UTC
Permalink
Post by Peter Otten
Post by Alex Kleider
Is there a better (more 'Pythonic') way to do the following?
There's the fileinput module
<https://docs.python.org/dev/library/fileinput.html#fileinput.input>
but personally I prefer the way you show above.
As a follow up question:
The following seems to work-

for f_name in list_of_file_names:
for line in open(f_name, 'r'):
process(line)

but should I be worried that the file doesn't get explicitly closed?

Alex
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Danny Yoo
2015-05-14 06:24:03 UTC
Permalink
Post by Alex Kleider
The following seems to work-
process(line)
but should I be worried that the file doesn't get explicitly closed?
It depends on context. Personally, I'd write it with the 'with' to
make it very clear that the loop will manage its resource. That being
said, it sounds like there might be concerned about the nesting.
We're nesting three or four levels deep, at the very least!

I'd agree with that. Because of this, it might be worthwile to
consider refactoring the processing of the file in a separate
function, something like this:

###############################
def processFile(f):
for line in f:
...

for f_name in list_of_file_names:
with open(f_name, 'r') as f:
processFile(f)
###############################

The primary reason is to reduce the nesting. But there's also a
potential side benefit: processFile() has a better chance of being
unit-testable, since we can pass in instances of other file-like
objects, such as io.StringIO(), to spot-check the behavior of the
process.
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alex Kleider
2015-05-14 08:01:39 UTC
Permalink
Post by Danny Yoo
Post by Alex Kleider
The following seems to work-
process(line)
but should I be worried that the file doesn't get explicitly closed?
It depends on context. Personally, I'd write it with the 'with' to
make it very clear that the loop will manage its resource. That being
said, it sounds like there might be concerned about the nesting.
We're nesting three or four levels deep, at the very least!
I'd agree with that. Because of this, it might be worthwile to
consider refactoring the processing of the file in a separate
###############################
...
processFile(f)
###############################
The primary reason is to reduce the nesting. But there's also a
potential side benefit: processFile() has a better chance of being
unit-testable, since we can pass in instances of other file-like
objects, such as io.StringIO(), to spot-check the behavior of the
process.
Thanks, Danny. This is particularly germane since the issue has come up
in the
midst of my first attempt to do test driven development. I'll try
refactoring per
your suggestion.
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alan Gauld
2015-05-14 07:01:44 UTC
Permalink
Post by Alex Kleider
The following seems to work-
process(line)
but should I be worried that the file doesn't get explicitly closed?
If you are only ever reading from the file then no, I'd not worry
too much. But in the general case, where you might be making
changes then yes, you should worry.

It will work 99% of the time but if things go wrong there's always
a chance that a file has not been closed yet and your changes have
not been written to the file. But if you are not changing the file
it doesn't matter too much.

The only other consideration is that some OS might put a lock
on the file even if it's only read access, so in those cases
closing will release the lock sooner. But I don't think any of
the popular OS do that any more.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alex Kleider
2015-05-14 07:57:12 UTC
Permalink
Post by Alan Gauld
Post by Alex Kleider
The following seems to work-
process(line)
but should I be worried that the file doesn't get explicitly closed?
If you are only ever reading from the file then no, I'd not worry
too much. But in the general case, where you might be making
changes then yes, you should worry.
It will work 99% of the time but if things go wrong there's always
a chance that a file has not been closed yet and your changes have
not been written to the file. But if you are not changing the file
it doesn't matter too much.
The only other consideration is that some OS might put a lock
on the file even if it's only read access, so in those cases
closing will release the lock sooner. But I don't think any of
the popular OS do that any more.
Thank you, Alan; I hadn't appreciated the important difference in risk
with write vs read. It's clear that 'best practice' would be to use
'with' so as to 'cover all bases.'
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alex Kleider
2015-05-14 07:52:16 UTC
Permalink
Post by Alex Kleider
The following seems to work-
process(line)
but should I be worried that the file doesn't get explicitly closed?
Alex
If you use the with statement you will guarantee that the file closes
as soon as you are done with it. It will also handle exceptions nicely
for you.
See: https://www.python.org/dev/peps/pep-0343/
In practice, Cpython's ref counting semantics means that running out
of file descriptors doesn't happen (unless you put that code in a
loop that gets called a whole lot). But the gc used by a Python
version is not part of the language specification, but is a
language implementation detail. If you are writing for PyPy or
Jython you will need to use the with statement or close your files
explicitly, so the gc knows you are done with them. Relying on
'the last reference to them went away' to close your file won't
work if the gc isn't counting references.
See: http://pypy.org/compat.html
http://pypy.readthedocs.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies
Laura
Thanks, Laura, for your analysis. I'll happily include the one extra
line and
let 'with' do its magic rather than depend on implementation details (or
worry
about their shortcomings:-)

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-05-12 19:54:01 UTC
Permalink
It was not that long ago that I found out about the fileinput module, so I
sometimes forget to use it. It is not specify the encoding of the files,
is it? It'd be nice if one could specify a tuple of encodings, e.g.
('utf-8-sig', 'latin1', 'cp1252').
input(files=None, inplace=0, backup='', bufsize=0, mode='r',
openhook=None)
input([files[, inplace[, backup[, mode[, openhook]]]]])
Whatever you plan to do with these encodings, it should be possible with a
custom openhook.

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Continue reading on narkive:
Loading...