Discussion:
[Tutor] A further question about opening and closing files
richard kappler
2015-09-09 14:24:57 UTC
Permalink
And this will repeatedly open the file, append one line, then close it
again. Almost certainly not what you want -- it's wasteful and
potentially expensive.
And I get that. It does bring up another question though. When using

with open(somefile, 'r') as f:
with open(filename, 'a') as f1:
for line in f:

the file being appended is opened and stays open while the loop iterates,
then the file closes when exiting the loop, yes? Does this not have the
potential to be expensive as well if you are writing a lot of data to the
file?
f1 = open("output/test.log", 'a')
f1.write("this is a test")
f1.write("this is a test")
f1.write('why isn\'t this writing????')
f1.close()
monitoring test.log as I went. Nothing was written to the file until I
closed it, or at least that's the way it appeared to the text editor in
which I had test.log open (gedit). In gedit, when a file changes it tells
you and gives you the option to reload the file. This didn't happen until I
closed the file. So I'm presuming all the writes sat in a buffer in memory
until the file was closed, at which time they were written to the file.

Is that actually how it happens, and if so does that not also have the
potential to cause problems if memory is a concern?

regards, Richard
--
All internal models of the world are approximate. ~ Sebastian Thrun
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-09-09 15:28:21 UTC
Permalink
f1 = open("output/test.log", 'a')
f1.write("this is a test")
f1.write("this is a test")
f1.write('why isn\'t this writing????')
f1.close()
If you want the thing written out, use f1.flush() whenever you want to
make sure this happens.

If you want completely unbuffered writing, then you can open your file
this way, with f1 = open("output/test.log", 'a', 0) I think if you are
on windows you can only get unbuffered writing if you open your file
in binary mode.

Laura

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
richard kappler
2015-09-09 15:40:07 UTC
Permalink
Thanks, tried them both, both work great on Linux. Now I understand better.

regards, Richard
Post by Laura Creighton
f1 = open("output/test.log", 'a')
f1.write("this is a test")
f1.write("this is a test")
f1.write('why isn\'t this writing????')
f1.close()
If you want the thing written out, use f1.flush() whenever you want to
make sure this happens.
If you want completely unbuffered writing, then you can open your file
this way, with f1 = open("output/test.log", 'a', 0) I think if you are
on windows you can only get unbuffered writing if you open your file
in binary mode.
Laura
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor
--
All internal models of the world are approximate. ~ Sebastian Thrun
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-09-09 16:42:05 UTC
Permalink
Post by richard kappler
f1 = open("output/test.log", 'a')
f1.write("this is a test")
f1.write("this is a test")
f1.write('why isn\'t this writing????')
f1.close()
monitoring test.log as I went. Nothing was written to the file until I
closed it, or at least that's the way it appeared to the text editor
For a short example like this its true, for a bigger example the
buffer will be flushed periodically, as it fills up.
This is not a Python thing it's an OS feature, the same is true
for any program. Its much more efficient use of the IO bus.
(Its also why you should always explicitly close a file opened
for writing - unless using with which does it for you)

You can force the writes (I see Laura has shown how) but
mostly you should just let the OS do it's thing. Otherwise
you risk cluttering up the IO bus and preventing other
programs from writing their files.

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-09-09 18:20:44 UTC
Permalink
Post by Alan Gauld
You can force the writes (I see Laura has shown how) but
mostly you should just let the OS do it's thing. Otherwise
you risk cluttering up the IO bus and preventing other
programs from writing their files.
Is this something we have to worry about these days? I haven't
worried about it for a long time, and write real time multiplayer
games which demand unbuffered writes .... Of course, things
would be different if I were sending gigabytes of video down the
pipe, but for the sort of small writes I am doing, I don't think
there is any performance problem at all.

Anybody got some benchmarks so we can find out?

Laura

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-09-09 18:58:22 UTC
Permalink
Post by Laura Creighton
Post by Alan Gauld
You can force the writes (I see Laura has shown how) but
mostly you should just let the OS do it's thing. Otherwise
you risk cluttering up the IO bus and preventing other
programs from writing their files.
Is this something we have to worry about these days? I haven't
worried about it for a long time, and write real time multiplayer
games which demand unbuffered writes .... Of course, things
would be different if I were sending gigabytes of video down the
pipe, but for the sort of small writes I am doing, I don't think
there is any performance problem at all.
Anybody got some benchmarks so we can find out?
Good question!

There's definitely a performance hit, but it's not as big as I expected:

py> with Stopwatch():
... with open("/tmp/junk", "w") as f:
... for i in range(100000):
... f.write("a")
...
time taken: 0.129952 seconds

py> with Stopwatch():
... with open("/tmp/junk", "w") as f:
... for i in range(100000):
... f.write("a")
... f.flush()
...
time taken: 0.579273 seconds


What really gets expensive is doing a sync.

py> with Stopwatch():
... with open("/tmp/junk", "w") as f:
... fid = f.fileno()
... for i in range(100000):
... f.write("a")
... f.flush()
... os.fsync(fid)
...
time taken: 123.283973 seconds


Yes, that's right. From half a second to two minutes.
--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-09-09 19:25:06 UTC
Permalink
Post by Laura Creighton
Post by Alan Gauld
You can force the writes (I see Laura has shown how) but
mostly you should just let the OS do it's thing. Otherwise
you risk cluttering up the IO bus and preventing other
programs from writing their files.
Is this something we have to worry about these days? I haven't
worried about it for a long time, and write real time multiplayer
games which demand unbuffered writes .... Of course, things
would be different if I were sending gigabytes of video down the
pipe, but for the sort of small writes I am doing, I don't think
there is any performance problem at all.
Anybody got some benchmarks so we can find out?
Laura
If you are working on a small platform - think mobile device - and it has
a single channel bus to the storage area then one of the worst things
you can do is write lots of small chunks of data to it. The overhead
(in hardware) of opening and locking the bus is almost as much as
the data transit time and so can choke the bus for a significant amount
of time (I'm talking milliseconds here but in real-time that's significant).

But even on a major OS platform bus contention does occasionally rear
its head. I've seen multi-processor web servers "lock up" due to too many
threads dumping data at once. Managing the data bus is (part of) what
the OS is there to do, it's best to let it do its job, second guessing
it is
rarely the right thing.

Remember, the impact is never on your own program it's on all the
other processes running on the same platform. There are usually tools
to monitor the IO bus performance though, so it's fairly easy to
diagnose/check.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-09-09 19:42:20 UTC
Permalink
Post by Alan Gauld
If you are working on a small platform - think mobile device - and it has
a single channel bus to the storage area then one of the worst things
you can do is write lots of small chunks of data to it. The overhead
(in hardware) of opening and locking the bus is almost as much as
the data transit time and so can choke the bus for a significant amount
of time (I'm talking milliseconds here but in real-time that's significant).
But if I shoot you with my laser cannon, I want you to get the
message that you are dead _now_ and not when some buffer fills up ...

Laura

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-09-09 22:19:43 UTC
Permalink
Post by Laura Creighton
Post by Alan Gauld
If you are working on a small platform - think mobile device - and it has
a single channel bus to the storage area then one of the worst things
you can do is write lots of small chunks of data to it. The overhead
(in hardware) of opening and locking the bus is almost as much as
the data transit time and so can choke the bus for a significant amount
of time (I'm talking milliseconds here but in real-time that's significant).
But if I shoot you with my laser cannon, I want you to get the
message that you are dead _now_ and not when some buffer fills up ...
There are two things about that:
1) human reaction time is measured in 100s of milliseconds so the delay
is not likely to be meaningful. If you do the flushes every 10ms
instead
of every write (assuming you are writing frequently) nobody is
likely to
notice.
2) Gamers tend not to be doing other things while playing, so you can
pretty
much monopolize the bus if you want to,

So if you know that you're the only game in town(sic) then go ahead and
flush everything to disk. It won't do much harm. But...

..., if your game engine is running on a server shared by other users and
some of them are running critical apps (think a businesses billing or
accounting suite that must complete its run within a 1 hour window say)
then
you become very unpopular quickly. In practice that means the sys admin
will
see who is flattening the bus and nice that process down till it stops
hurting
the others. That means your game now runs at 10% the CPU power it had
a while ago...

As programmers we very rarely have the control over our environment that
we like to think we do.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-09-10 15:44:27 UTC
Permalink
Post by Alan Gauld
..., if your game engine is running on a server shared by other users and
some of them are running critical apps (think a businesses billing or
accounting suite that must complete its run within a 1 hour window say)
then
you become very unpopular quickly. In practice that means the sys admin
will
see who is flattening the bus and nice that process down till it stops
hurting
the others. That means your game now runs at 10% the CPU power it had
a while ago...
We were talking about mobile devices ...

These days every business billing and accounting suite I know of --
and we _make_ a bookkeeping suite, which we need to interface
with many other things ....

runs in its own VM and doesn't share with anybody.

multiplayer games tend to run in their own VM as well, and too many
users using the game can and does degrade performance. But I don't
think that having too many users flush their stream objects is a
significant part of this problem, compared to the problems of
getting your bits out of your graphics card, for instance.

Laura
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-09-10 16:17:26 UTC
Permalink
Post by Laura Creighton
Post by Alan Gauld
..., if your game engine is running on a server shared by other users and
some of them are running critical apps (think a businesses billing or
We were talking about mobile devices ...
Ok, in that case I'd guess the client playing the game is only
interested in the game so won't care about impact on other
processes.
Post by Laura Creighton
These days every business billing and accounting suite I know of --
and we _make_ a bookkeeping suite, which we need to interface
with many other things ....
runs in its own VM and doesn't share with anybody.
But it does. The physical file system and its IO bus is still
shared by every VM and therefore by every app on every VM.
And when you do a flush you force the VM to write to the
physical IO bus. Even if the VM uses a virtual file system
(which is no good if you are sharing with other VMs in a
common database environment) it will almost certainly be
manifested as a physical file eventually. If you are lucky
the VM will defer flushes to the physical FS but that is
normally a configurable parameter.

One of the biggest problems facing modern data centres is
how to manage access to physical devices from virtual
environments. It is a very delicate balancing act and
very easy to get wrong. And where in the past a mistake
upset a handful of apps it now impacts hundreds. We
are still in early days with VM technology and the monitoring
tools are improving and the hardware architectures are
developing with parallel buses etc to avoid these
kinds of problems. But most of the data centres I
work with are still running at least 50% capacity
on pre VM-optimised servers.
Post by Laura Creighton
multiplayer games tend to run in their own VM as well, and too many
users using the game can and does degrade performance. But I don't
think that having too many users flush their stream objects is a
significant part of this problem,
Regular flushes hardly ever affect the program doing the
flushing. It's whatever is sharing the IO bus. Remember the
issue is the physical hardware bus not the software
environment. If you are writing to your own file system
over a segregated bus (and there are several technologies
for doing that (SAN, NAS, multiplexed buses etc etc)) then
its not usually an issue.

It's only if you are on a machine where you share a single IO
channel with other apps that problems can occur - for the
other apps.

However, I suspect we might now be getting a wee bit OT
for most readers on this list! So I will put my architect's
hat away again. :-)
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-09-09 18:46:53 UTC
Permalink
Post by richard kappler
And this will repeatedly open the file, append one line, then close it
again. Almost certainly not what you want -- it's wasteful and
potentially expensive.
And I get that. It does bring up another question though. When using
the file being appended is opened and stays open while the loop iterates,
then the file closes when exiting the loop, yes?
The file closes when exiting the *with block*, not necessarily the loop.
Consider:

with open(blah blah blah) as f:
for line in f:
pass
time.sleep(120)
# file isn't closed until we get here

Even if the file is empty, and there are no lines, it will be held open
for two minutes.
Post by richard kappler
Does this not have the
potential to be expensive as well if you are writing a lot of data to the
file?
Er, expensive in what way?

Yes, I suppose it is more expensive to write 1 gigabyte of data to a
file than to write 1 byte. What's your point? If you want to write 1 GB,
then you have to write 1 GB, and it will take as long as it takes.

Look at it this way: suppose you have to hammer 1000 nails into a fence.
You can grab your hammer out of your tool box, hammer one nail, put the
hammer back in the tool box and close the lid, open the lid, take the
hammer out again, hammer one nail, put the hammer back in the tool box,
close the lid, open the lid again, take out the hammer...

Or you take the hammer out, hammer 1000 nails, then put the hammer away.
Sure, while you are hammering those 1000 nails, you're not mowing the
lawn, painting the porch, walking the dog or any of the dozen other jobs
you have to do, but you have to hammer those nails eventually.
Post by richard kappler
f1 = open("output/test.log", 'a')
f1.write("this is a test")
f1.write("this is a test")
f1.write('why isn\'t this writing????')
f1.close()
monitoring test.log as I went. Nothing was written to the file until I
closed it, or at least that's the way it appeared to the text editor in
which I had test.log open (gedit). In gedit, when a file changes it tells
you and gives you the option to reload the file. This didn't happen until I
closed the file. So I'm presuming all the writes sat in a buffer in memory
until the file was closed, at which time they were written to the file.
Correct. All modern operating systems do that. Writing to disk is slow,
*hundreds of thousands of times slower* than writing to memory, so the
operating system will queue up a reasonable amount of data before
actually forcing it to the disk drive.
Post by richard kappler
Is that actually how it happens, and if so does that not also have the
potential to cause problems if memory is a concern?
No. The operating system is not stupid enough to queue up gigabytes of
data. Typically the buffer is a something like 128 KB of data (I think),
or maybe a MB or so. Writing a couple of short lines of text won't fill
it, which is why you don't see any change until you actually close the
file. Try writing a million lines, and you'll see something different.
The OS will flush the buffer when it is full, or when you close the
file, whichever happens first.

If you know that you're going to take a long time to fill the buffer,
say you're performing a really slow calculation, and your data is
trickling in really slowly, then you might do a file.flush() every few
seconds or so. Or if you're writing an ACID database. But for normal
use, don't try to out-smart the OS, because you will fail. This is
really specialised know-how.

Have you noticed how slow gedit is to save files? That's because the
gedit programmers thought they were smarter than the OS, so every time
they write a file, they call flush() and sync(). Possibly multiple
times. All that happens is that they slow the writing down greatly.
Other text editors let the OS manage this process, and saving is
effectively instantaneous. With gedit, there's a visible pause when it
saves. (At least in all the versions of gedit I've used.)

And the data is not any more safe than the other text editors,
because when the OS has written to the hard drive, there is no guarantee
that the data has hit the platter yet. Hard drives themselves contain
buffers, and they won't actually write data to the platter until they
are good and ready.
--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Continue reading on narkive:
Loading...