Discussion:
[Tutor] scratching my head
Clayton Kirkwood
2015-08-02 21:44:15 UTC
Permalink
Hey, been awhile, but I ran into os.walk and it fit what I needed to do for
an issue I've had for a long time: I have tons of pictures in my top
directory of pictures which are duplicated into properly named
subdirectories. Please see issues above my questions with large gaps below.
TIA,
Clayton


#Program to find duplicated pictures in my picture directory tree
#Presumably, if the file exists in a subdirectory I can remove if from the
parent picture directory
#
#Clayton Kirkwood
#01Aug15

import os
from os.path import join, getsize

main_dir = "/users/Clayton/Pictures"
directory_file_list = {}
duplicate_files = 0
top_directory_file_list = 0

for dir_path, directories, files in os.walk(main_dir):
for file in files:
# print( " file = ", file)
# if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):
# if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower()
):
#
#why don't these work?, especially the last one. How am I to capture all
camera and video types
#except by the drudgery below. I should be able to just have a list, maybe
from a file, that lists all
#off the types and do something like if master_list not in file.lower()





if( ".jpg" not in file.lower() and
".png" not in file.lower() and
".avi" not in file.lower() and
".mp4" not in file.lower() ):

print( "file ", file, "doesn't contain .jpg or .png or .avi or
.mp4" )
# del files[file]
#
#I get an error on int expected here. If I'm able to access by string, why
wouldn't I be able to
#acess in the del?





directory_file_list[dir_path] = files #this is a list
# print(dir_path, directory_file_list[dir_path])
#print( main_dir )
for directory_path in directory_file_list.keys():
if( directory_path == main_dir ):
top_directory_file_list = directory_file_list[directory_path]
continue
# print( directory_path, ":", directory_file_list[directory_path])
file_list = directory_file_list[directory_path]
# print(file_list)
for file in file_list:
# pass
print( "looking at file ", file, " in top_directory_file_list ",
top_directory_file_list )
if file in top_directory_file_list:
#error: arg of type int not iterable
#yet it works for the for loops





print( "file ", file, " found in both directory_path ",
directory_path, " and ", main_dir)
duplicate_files =+ 1
pass
break


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-08-02 22:01:05 UTC
Permalink
Post by Clayton Kirkwood
# print( " file = ", file)
Python sees that as a single string. That string is not in your filename.
Post by Clayton Kirkwood
# if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower()
Python sees that as a boolean expression so will try to
work it out as a True/False value. Since a non empty
string is considered True and the first True expression
makes an OR opeation True overall it returns ".jpg" and
tests if it is not in the filename.
Post by Clayton Kirkwood
#except by the drudgery below. I should be able to just have a list, maybe
from a file, that lists all
You might think so but that's not how 'in' works.

But you could use a loop:

found = False
for s in (".jpg",".png",".avi",".mp4"):
found = test or (s in file.lower())
if not found: ...
Post by Clayton Kirkwood
if( ".jpg" not in file.lower() and
".png" not in file.lower() and
".avi" not in file.lower() and
Whether that's any better than your combined test is a moot point.

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Clayton Kirkwood
2015-08-02 22:28:06 UTC
Permalink
-----Original Message-----
Behalf Of Alan Gauld
Sent: Sunday, August 02, 2015 3:01 PM
Subject: Re: [Tutor] scratching my head
Post by Clayton Kirkwood
# print( " file = ", file)
Python sees that as a single string. That string is not in your filename.
Post by Clayton Kirkwood
# if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower()
Python sees that as a boolean expression so will try to work it out as a
True/False value. Since a non empty string is considered True and the
first
True expression makes an OR opeation True overall it returns ".jpg" and
tests
if it is not in the filename.
Post by Clayton Kirkwood
#except by the drudgery below. I should be able to just have a list,
maybe from a file, that lists all
You might think so but that's not how 'in' works.
found = False
found = test or (s in file.lower()) if not found: ...
The for is much better and it's able to get input from a file. I would think
Python more sensible if something like my commented one would work. That
would make more sense to me.

Thanks
Post by Clayton Kirkwood
if( ".jpg" not in file.lower() and
".png" not in file.lower() and
".avi" not in file.lower() and
Whether that's any better than your combined test is a moot point.
HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Cameron Simpson
2015-08-02 22:35:01 UTC
Permalink
Post by Alan Gauld
Post by Clayton Kirkwood
# print( " file = ", file)
Python sees that as a single string. That string is not in your filename.
Post by Clayton Kirkwood
# if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower()
[...]
Post by Alan Gauld
found = False
found = test or (s in file.lower())
if not found: ...
Post by Clayton Kirkwood
if( ".jpg" not in file.lower() and
".png" not in file.lower() and
".avi" not in file.lower() and
Whether that's any better than your combined test is a moot point.
Alan has commented extensively on the logic/implementation errors. I have a
suggestion.

Personally I'd be reaching for os.path.splitext. Untested example below:

from os.path import splitext
....
for dir_path, directories, files in os.walk(main_dir):
for file in files:
prefix, ext = splitext(file)
if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'):
....

which I think is much easier to read.

BTW, I'd be using the variable names "filename" and "filenames" instead of
"file" and "files": in python 2 "file" is a builtin function (though long
deprecated by "open()") and in any case I'd (personally) expect such a name to
be an _open_ file. As opposed to "filename", which is clearer.

Cheers,
Cameron Simpson <***@zip.com.au>

Rudin's Law:
If there is a wrong way to do something, most people will do it every time.
Rudin's Second Law:
In a crisis that forces a choice to be made among alternative courses of
action, people tend to choose the worst possible course.
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Clayton Kirkwood
2015-08-02 23:15:19 UTC
Permalink
-----Original Message-----
Behalf Of Cameron Simpson
Sent: Sunday, August 02, 2015 3:35 PM
Subject: Re: [Tutor] scratching my head
Post by Alan Gauld
Post by Clayton Kirkwood
# print( " file = ", file)
Python sees that as a single string. That string is not in your filename.
Post by Clayton Kirkwood
# if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower()
[...]
Post by Alan Gauld
found = False
found = test or (s in file.lower()) if not found: ...
Post by Clayton Kirkwood
if( ".jpg" not in file.lower() and
".png" not in file.lower() and
".avi" not in file.lower() and
Whether that's any better than your combined test is a moot point.
Alan has commented extensively on the logic/implementation errors. I have
a suggestion.
from os.path import splitext
....
prefix, ext = splitext(file)
....
which I think is much easier to read.
BTW, I'd be using the variable names "filename" and "filenames" instead of
"file" and "files": in python 2 "file" is a builtin function (though long
deprecated by "open()") and in any case I'd (personally) expect such a
name
to be an _open_ file. As opposed to "filename", which is clearer.
Thanks, that should also help a lot. Now time to look at splitext, and the
ext and ext[1:. I appreciate your comments also about the variable names.
Any comments on the problems lower in the file?

Clayton
Cheers,
If there is a wrong way to do something, most people will do it every time.
In a crisis that forces a choice to be made among alternative courses of
action, people tend to choose the worst possible course.
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Cameron Simpson
2015-08-03 01:02:42 UTC
Permalink
Post by Clayton Kirkwood
Behalf Of Cameron Simpson
Sent: Sunday, August 02, 2015 3:35 PM
[...]
Post by Clayton Kirkwood
from os.path import splitext
....
prefix, ext = splitext(file)
....
which I think is much easier to read.
BTW, I'd be using the variable names "filename" and "filenames" instead of
"file" and "files": in python 2 "file" is a builtin function (though long
deprecated by "open()") and in any case I'd (personally) expect such a
name
to be an _open_ file. As opposed to "filename", which is clearer.
Thanks, that should also help a lot. Now time to look at splitext, and the
ext and ext[1:.
The "[1:]" is because "ext" will include the dot.
Post by Clayton Kirkwood
I appreciate your comments also about the variable names.
Any comments on the problems lower in the file?
Maybe you'd better reraise these problems again explicitly.

Cheers,
Cameron Simpson <***@zip.com.au>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Clayton Kirkwood
2015-08-03 01:33:30 UTC
Permalink
-----Original Message-----
Behalf Of Cameron Simpson
Sent: Sunday, August 02, 2015 6:03 PM
Subject: Re: [Tutor] scratching my head
Post by Clayton Kirkwood
Behalf Of Cameron Simpson
Sent: Sunday, August 02, 2015 3:35 PM
[...]
Post by Clayton Kirkwood
from os.path import splitext
....
prefix, ext = splitext(file)
....
which I think is much easier to read.
BTW, I'd be using the variable names "filename" and "filenames"
instead of "file" and "files": in python 2 "file" is a builtin
function (though long deprecated by "open()") and in any case I'd
(personally) expect such a
name
to be an _open_ file. As opposed to "filename", which is clearer.
Thanks, that should also help a lot. Now time to look at splitext, and
the ext and ext[1:.
The "[1:]" is because "ext" will include the dot.
Yeah, after looking it up, it became clear, but thanks!
Post by Clayton Kirkwood
I appreciate your comments also about the variable names.
Any comments on the problems lower in the file?
Maybe you'd better reraise these problems again explicitly.
Point taken.
Cheers,
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-08-02 22:46:31 UTC
Permalink
Post by Alan Gauld
found = False
found = test or (s in file.lower())
Oops, that should be:

found = found or (s in file.lower())

Sorry, 'test' was my first choice of name
but I changed it to found later.
But not everywhere :-(
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-08-03 00:56:18 UTC
Permalink
Post by Alan Gauld
Post by Alan Gauld
found = False
found = test or (s in file.lower())
found = found or (s in file.lower())
extensions = (".jpg",".png",".avi",".mp4")
found = any(s in filename.lower() for s in extensions)

but that's still wrong, because it will find files like

History.of.Avis.PA.pdf

as if it were an AVI file. Instead, use os.path.splitext.
--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-08-03 00:49:25 UTC
Permalink
Post by Clayton Kirkwood
# print( " file = ", file)
# if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower()
name, ext = os.path.splitext(filename)
if ext.lower() in ('.jpg', '.png', '.avi', '.mp4'):
...
Post by Clayton Kirkwood
# del files[file]
#
#I get an error on int expected here. If I'm able to access by string, why
wouldn't I be able to
#acess in the del?
What are you attempting to do here? files is a list of file names:

files = ['this.jpg', 'that.txt', 'other.pdf']
filename = 'that.txt'

What do you expect files['that.txt'] to do?

The problem has nothing to do with del, the problem is that you are
trying to access the 'that.txt'-th item of a list, and that is
meaningless.
Post by Clayton Kirkwood
print( "looking at file ", file, " in top_directory_file_list ",
top_directory_file_list )
What does this print? In particular, what does the last part,
Post by Clayton Kirkwood
#error: arg of type int not iterable
is clear that it is an int.
Post by Clayton Kirkwood
#yet it works for the for loops
I think you are confusing:

top_directory_file_list

directory_file_list


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Clayton Kirkwood
2015-08-03 01:54:51 UTC
Permalink
-----Original Message-----
Behalf Of Steven D'Aprano
Sent: Sunday, August 02, 2015 5:49 PM
Subject: Re: [Tutor] scratching my head
Post by Clayton Kirkwood
# print( " file = ", file)
# if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower()
name, ext = os.path.splitext(filename)
...
Post by Clayton Kirkwood
# del files[file]
#
#I get an error on int expected here. If I'm able to access by string,
why wouldn't I be able to #acess in the del?
files = ['this.jpg', 'that.txt', 'other.pdf'] filename = 'that.txt'
What do you expect files['that.txt'] to do?
The problem has nothing to do with del, the problem is that you are trying
to
access the 'that.txt'-th item of a list, and that is meaningless.
Well, I was expecting that the list entry would be deleted. In other parts
of my code I am using filenames as the index of lists: list[filenames] for
for loops and some ifs where it appears to work. I am able to look at
directories and the files in them by doing this. Check the rest of my
original code. I had one if that complained at the bottom of my code that
complained that the index was supposed to be an in not the list element
value. So I get that the index is supposed to be an int, and I think what is
happening in much of the code is the filename somehow becomes an int and
then the list accesses that way. It's very confusing. Basically, I was using
filenames as indexes into the list.
Post by Clayton Kirkwood
print( "looking at file ", file, " in
top_directory_file_list ", top_directory_file_list )
What does this print? In particular, what does the last part,
Post by Clayton Kirkwood
#error: arg of type int not iterable
is clear that it is an int.
Post by Clayton Kirkwood
#yet it works for the for loops
top_directory_file_list
directory_file_list
I don't know. If you look at the code that is going thru the directory
filename by filename the prints kick out filename and directories and the
list elements are addressed by "strings", the actual filenames.

What is happening in most of the code looks like what one would expect if
the lists could be indexed by words not ints. As a programmer, I would
expect lists to be addressed via a name or a number. It seems kind of like
dicktionaries. Am I mixing dictionaries and list?


Clayton
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-08-03 06:12:05 UTC
Permalink
I think people are giving you sub-optimal advice.

Python has a module in the standard library for doing exactly what
you want to do -- match files with certain extensions.

See: https://docs.python.org/2/library/fnmatch.html

It's unix style file matching, but I am fairly certain this works
on windows also. I don't have a windows machine to test and make sure.

Laura

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Cameron Simpson
2015-08-03 08:22:32 UTC
Permalink
Post by Laura Creighton
I think people are giving you sub-optimal advice.
Python has a module in the standard library for doing exactly what
you want to do -- match files with certain extensions.
See: https://docs.python.org/2/library/fnmatch.html
It's unix style file matching, but I am fairly certain this works
on windows also. I don't have a windows machine to test and make sure.
That depends. This is the tutor list; we're helping Clayton debug his code as
an aid to learning. While it's good to know about the facilities in the
standard library, pointing him directly at fnmatch (which I'd entirely
forgotten) is the "give a man a fish" approach to help; a magic black box to do
the job for him.

Besides, I'm not sure fnmatch is much better for his task than the more direct
methods being discussed.

Cheers,
Cameron Simpson <***@zip.com.au>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-08-03 09:18:10 UTC
Permalink
Post by Cameron Simpson
That depends. This is the tutor list; we're helping Clayton debug his code as
an aid to learning. While it's good to know about the facilities in the
standard library, pointing him directly at fnmatch (which I'd entirely
forgotten) is the "give a man a fish" approach to help; a magic black box to do
the job for him.
Besides, I'm not sure fnmatch is much better for his task than the more direct
methods being discussed.
And I am certain. It works exactly as he said he wanted -- a less
cumbersome way to solve this problem, which he thought would be done
some way with a for loop, looping over extensions, instead of the
cumbersome way he is doing things.

His design sense was perfectly fine; there is an elegant way to solve
the problem precisely along the lines he imagined -- he just wasn't
aware of this bit of the standard library.

There is no particular virtue in teaching somebody how to build a
pneumatic drill in order to crack walnuts.

Laura

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Peter Otten
2015-08-05 06:43:45 UTC
Permalink
Post by Laura Creighton
Post by Cameron Simpson
That depends. This is the tutor list; we're helping Clayton debug his code
as an aid to learning. While it's good to know about the facilities in the
standard library, pointing him directly at fnmatch (which I'd entirely
forgotten) is the "give a man a fish" approach to help; a magic black box
to do the job for him.
Besides, I'm not sure fnmatch is much better for his task than the more
direct methods being discussed.
And I am certain. It works exactly as he said he wanted -- a less
cumbersome way to solve this problem, which he thought would be done
some way with a for loop, looping over extensions, instead of the
cumbersome way he is doing things.
I suppose you have some way in mind to simplify

# version 1, splitext()
import os

filenames = ["foo.jpg", "bar.PNG", "baz.txt"]
EXTENSIONS = {".jpg", ".png"}
matching_filenames = [
name for name in filenames
if os.path.splitext(name)[1].lower() in EXTENSIONS]
print(matching_filenames)

with fnmatch. I can only come up with

# version 2, fnmatch()
import fnmatch
filenames = ["foo.jpg", "bar.PNG", "baz.txt"]
GLOBS = ["*.jpg", "*.png"]
matching_filenames = [
name for name in filenames
if any(fnmatch.fnmatch(name.lower(), pat) for pat in GLOBS)]
print(matching_filenames)

but I don't think that's simpler. Can you enlighten me?

Digression: I don't know if str.endswith() was already suggested. I think
that is a (small) improvement over the first version

# version 3, endswith()
filenames = ["foo.jpg", "bar.PNG", "baz.txt"]
EXTENSIONS = (".jpg", ".png")
matching_filenames = [
name for name in filenames
if name.lower().endswith(EXTENSIONS)]
print(matching_filenames)


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Laura Creighton
2015-08-06 04:31:23 UTC
Permalink
Post by Peter Otten
but I don't think that's simpler. Can you enlighten me?
When I got here, I landed in the middle of a discussion on how to
use regexps for solving this. Plus a slew of string handling
functions, none of which included endswith, which I think is a
fine idea as well.

The nice thing about fname is that it handles all the 'are your
file names case insensitive' stuff for you, which can be a problem.

Laura

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Emile van Sebille
2015-08-04 13:48:46 UTC
Permalink
Post by Cameron Simpson
That depends. This is the tutor list; we're helping Clayton debug his
code as an aid to learning. While it's good to know about the facilities
in the standard library, pointing him directly at fnmatch (which I'd
entirely forgotten) is the "give a man a fish" approach to help; a magic
black box to do the job for him.
Sometimes a fish of three or four lines that replaces a 20 line effort
might be better considered as a solution to be teased apart and
understood.

Emile



_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Válas Péter
2015-08-03 06:34:40 UTC
Permalink
Post by Clayton Kirkwood
# print( " file = ", file)
I supppose you want to use regular expressions here and you are somehow
familiar with them but you forgot to tell Python to handle your string as
regex. This kind of expression must be matched against filenames instead of
using "in" operator.

In this case, https://docs.python.org/3/library/re.html and
https://docs.python.org/3/howto/regex.html are your friends.
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Continue reading on narkive:
Loading...