Discussion:
[Tutor] scratching my head - still
Clayton Kirkwood
2015-08-05 00:52:15 UTC
Permalink
As seen below (closely), some filenames are not being removed while others
are, such as in the first stanza, some pdfs are removed, some aren't. In the
second stanza, Thumbs.db makes it through, but was caught in the first
stanza. (Thanks for those who have proffered solutions to date!)
I see no logic in the results. What am I missing???
TIA, Clayton


import os
from os.path import join, getsize, splitext

main_dir = "/users/Clayton/Pictures"
directory_file_list = {}
duplicate_files = 0
top_directory_file_list = 0

for dir_path, directories, filenames in os.walk(main_dir):
print( "filenames = ", filenames, "\n" )
for filename in filenames:
prefix, ext = splitext(filename)
if not (ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4',
'mov', 'bmp') ):
print( "deleting filename ", filename, " because ",
ext[1:].lower(), " doesn't contain .jpg or .png or .avi or .mp4 or .bmp" )
filenames.remove(filename)

print("\nfilenames - bad exts:\n", filenames )


produces:

filenames = ['.picasa.ini', '2010-11-02 15.58.30.jpg', '2010-11-02
15.58.45.jpg', '2010-11-25 09.42.59.jpg', '2011-03-19 19.32.09.jpg',
'2011-05-28 17.13.38.jpg', '2011-05-28 17.26.37.jpg', '2012-02-02
20.16.46.jpg', '218.JPG', 'desktop.ini', 'Guide ENG.pdf', 'Guide FRE.pdf',
'Guide GER.pdf', 'Guide Ita.pdf', 'Guide Spa.pdf', 'honda accident 001.jpg',
'honda accident 002.jpg', 'honda accident 003.jpg', 'honda accident
004.jpg', 'honda accident 005.jpg', 'honda accident 006.jpg', 'honda
accident 007.jpg', 'Image (1).jpg', 'Image.jpg', 'IMG.jpg', 'IMG00003.jpg',
'IMG00040.jpg', 'IMG00058.jpg', 'IMG_0003.jpg', 'IMG_0004.jpg',
'IMG_0005.jpg', 'IMG_0007.jpg', 'IMG_0008.jpg', 'IMG_0009.jpg',
'IMG_0010.jpg', 'Mak diploma handshake.jpg', 'New Picture.bmp', 'OneNote
Table Of Contents (2).onetoc2', 'temp 121.jpg', 'temp 122.jpg', 'temp
220.jpg', 'temp 320.jpg', 'temp 321.jpg', 'temp 322.jpg', 'temp 323.jpg',
'temp 324.jpg', 'temp 325.jpg', 'temp 326.jpg', 'temp 327.jpg', 'temp
328.jpg', 'temp 329.jpg', 'temp 330.jpg', 'temp 331.jpg', 'temp 332.jpg',
'temp 333.jpg', 'temp 334.jpg', 'temp 335.jpg', 'temp 336.jpg', 'temp
337.jpg', 'temp 338.jpg', 'temp 339.jpg', 'temp 340.jpg', 'temp 341.jpg',
'temp 342.jpg', 'temp 343.jpg', 'Thumbs.db']

deleting filename .picasa.ini because ini doesn't contain .jpg or .png
or .avi or .mp4 or .bmp
deleting filename desktop.ini because ini doesn't contain .jpg or .png
or .avi or .mp4 or .bmp
deleting filename Guide FRE.pdf because pdf doesn't contain .jpg or
.png or .avi or .mp4 or .bmp
deleting filename Guide Ita.pdf because pdf doesn't contain .jpg or
.png or .avi or .mp4 or .bmp
deleting filename OneNote Table Of Contents (2).onetoc2 because onetoc2
doesn't contain .jpg or .png or .avi or .mp4 or .bmp
deleting filename Thumbs.db because db doesn't contain .jpg or .png or
.avi or .mp4 or .bmp

filenames - bad exts:
['2010-11-02 15.58.30.jpg', '2010-11-02 15.58.45.jpg', '2010-11-25
09.42.59.jpg', '2011-03-19 19.32.09.jpg', '2011-05-28 17.13.38.jpg',
'2011-05-28 17.26.37.jpg', '2012-02-02 20.16.46.jpg', '218.JPG', 'Guide
ENG.pdf', 'Guide GER.pdf', 'Guide Spa.pdf', 'honda accident 001.jpg', 'honda
accident 002.jpg', 'honda accident 003.jpg', 'honda accident 004.jpg',
'honda accident 005.jpg', 'honda accident 006.jpg', 'honda accident
007.jpg', 'Image (1).jpg', 'Image.jpg', 'IMG.jpg', 'IMG00003.jpg',
'IMG00040.jpg', 'IMG00058.jpg', 'IMG_0003.jpg', 'IMG_0004.jpg',
'IMG_0005.jpg', 'IMG_0007.jpg', 'IMG_0008.jpg', 'IMG_0009.jpg',
'IMG_0010.jpg', 'Mak diploma handshake.jpg', 'New Picture.bmp', 'temp
121.jpg', 'temp 122.jpg', 'temp 220.jpg', 'temp 320.jpg', 'temp 321.jpg',
'temp 322.jpg', 'temp 323.jpg', 'temp 324.jpg', 'temp 325.jpg', 'temp
326.jpg', 'temp 327.jpg', 'temp 328.jpg', 'temp 329.jpg', 'temp 330.jpg',
'temp 331.jpg', 'temp 332.jpg', 'temp 333.jpg', 'temp 334.jpg', 'temp
335.jpg', 'temp 336.jpg', 'temp 337.jpg', 'temp 338.jpg', 'temp 339.jpg',
'temp 340.jpg', 'temp 341.jpg', 'temp 342.jpg', 'temp 343.jpg']

filenames = ['IMG_0028.JPG', 'IMG_0031.JPG', 'IMG_0032.JPG',
'IMG_0035.JPG', 'IMG_0037.JPG', 'IMG_0039.JPG', 'OneNote Table Of
Contents.onetoc2', 'Thumbs.db', 'ZbThumbnail.info']

deleting filename OneNote Table Of Contents.onetoc2 because onetoc2
doesn't contain .jpg or .png or .avi or .mp4 or .bmp
deleting filename ZbThumbnail.info because info doesn't contain .jpg or
.png or .avi or .mp4 or .bmp

filenames - bad exts:
['IMG_0028.JPG', 'IMG_0031.JPG', 'IMG_0032.JPG', 'IMG_0035.JPG',
'IMG_0037.JPG', 'IMG_0039.JPG', 'Thumbs.db']


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-08-05 02:46:10 UTC
Permalink
Post by Clayton Kirkwood
As seen below (closely), some filenames are not being removed while others
are, such as in the first stanza, some pdfs are removed, some aren't. In the
second stanza, Thumbs.db makes it through, but was caught in the first
stanza. (Thanks for those who have proffered solutions to date!)
I see no logic in the results. What am I missing???
You are modifying the list of files while iterating over it, which plays
all sorts of hell with the process. Watch this:

py> alist = [1, 2, 3, 4, 5, 6, 7, 8]
py> for n in alist:
... if n%2 == 0: # even number
... alist.remove(n)
... print(n)
...
1
2
4
6
8
py> print(alist)
[1, 3, 5, 7]



If you pretend to be the Python interpreter, and simulate the process
yourself, you'll see the same thing. Imagine that there is a pointer to
the current item in the list. First time through the loop, it points to
the first item, and you print the value and move on:

[>1, 2, 3, 4, 5, 6, 7, 8]
print 1

The second time through the loop:

[1, >2, 3, 4, 5, 6, 7, 8]
remove the 2, leaves the pointer pointing at three:
[1, >3, 4, 5, 6, 7, 8]
print 2

Third time through the loop, we move on to the next value:

[1, 3, >4, 5, 6, 7, 8]
remove the 4, leaves the pointer pointing at five:
[1, 3, >5, 6, 7, 8]
print 4

and so on.

The lesson here is that you should never modify a list while iterating
over it. Instead, make a copy, and modify the copy.
--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Cameron Simpson
2015-08-05 05:53:09 UTC
Permalink
Post by Steven D'Aprano
Post by Clayton Kirkwood
As seen below (closely), some filenames are not being removed while others
are, such as in the first stanza, some pdfs are removed, some aren't. In the
second stanza, Thumbs.db makes it through, but was caught in the first
stanza. (Thanks for those who have proffered solutions to date!)
I see no logic in the results. What am I missing???
You are modifying the list of files while iterating over it, which plays
[... detailed explaination ...]
Post by Steven D'Aprano
The lesson here is that you should never modify a list while iterating
over it. Instead, make a copy, and modify the copy.
What Steven said. Yes indeed.

Untested example suggestion:

all_filenames = set(filenames)
for filename in filenames:
if .. test here ...:
all_filenames.remove(filename)
print(all_filenames)

You could use a list instead of a set and for small numbers of files be fine.
With large numbers of files a set is far faster to remove things from.

Cheers,
Cameron Simpson <***@zip.com.au>

In the desert, you can remember your name,
'cause there ain't no one for to give you no pain. - America
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-08-05 07:35:09 UTC
Permalink
Post by Cameron Simpson
Post by Steven D'Aprano
Post by Clayton Kirkwood
As seen below (closely), some filenames are not being removed while
others are, such as in the first stanza, some pdfs are removed, some
aren't. In the second stanza, Thumbs.db makes it through, but was caught
in the first stanza. (Thanks for those who have proffered solutions to
date!) I see no logic in the results. What am I missing???
You are modifying the list of files while iterating over it, which plays
[... detailed explaination ...]
Post by Steven D'Aprano
The lesson here is that you should never modify a list while iterating
over it. Instead, make a copy, and modify the copy.
What Steven said. Yes indeed.
all_filenames = set(filenames)
all_filenames.remove(filename)
print(all_filenames)
You could use a list instead of a set and for small numbers of files be
fine. With large numbers of files a set is far faster to remove things
from.
If the list size is manageable, usually the case for the names of files in
one directory, you should not bother about removing items. Just build a new
list:

all_filenames = [...]
matching_filenames = [name for name in all_filenames if test(name)]

If the list is huge and you expect that most items will be kept you might
try reverse iteration:

for i in reversed(range(len(all_filenames))):
name = all_filenames[i]
if test(name):
del all_filenames[i]

This avoids both copying the list and the linear search performed by
list.remove().

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Continue reading on narkive:
Loading...