Discussion:
[Tutor] String Attribute
l***@gmail.com
2015-07-28 23:33:53 UTC
Permalink
Hi Everyone:


What is the source of the syntax error to the String Attribute?



Go to the following URL links and view a copy of the raw data file code and sample data:


1.) http://tinyurl.com/p2xxxhl
2.) http://tinyurl.com/nclg6pq


Here is the desired output:


***@uct.ac.za
***@media.berkeley.edu
....



Hal






Sent from Surface
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-07-29 09:17:31 UTC
Permalink
On 29/07/15 00:33, ***@gmail.com wrote:

> Hi Everyone:
>
>
> What is the source of the syntax error to the String Attribute?

Normally I'd ask you to post the full text of any errors.
They usually contain a lot of useful information. They
also help us identify which syntax error you are asking
about in the case where there are several! :-)

But in your case it seems you are running the code in an online debugger
so you may not have seen the full error text. Although,
even there, it gives more information that you posted, namely:

AttributeError: 'str' object has no attribute 'startwith'

So it's not a syntax error but an attribute error...

Your error is with the attribute startwith, which doesn't exist.
To check the attributes of a string, type dir(str) at a >>> prompt.
(I assume you have access to one of those somewhere?)
You will see that you mis-spelled startswith.

However, your code has several other problems...

> Go to the following URL links and view a copy of the raw data file code and sample data:
>
> 1.) http://tinyurl.com/p2xxxhl
> 2.) http://tinyurl.com/nclg6pq

If its short (<100 lines?) just include the code in the message.
Here it is:

count = 0
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
for line in fname:
line = line.strip()
if not line.startwith('From '): continue
line = line.split()
count = count + 1
print len(line)
fh = open(fname)
print "There were", count, "lines in the file with From as the first word"

You set the filename and then iterate over the name.
I suspect you intended to iterate over the file contents?
To do that you need to open the file (which you do near
the end!) So something like:

with open(fname as in_file:
for line in in_file:
# do your stuff here

The next problem is that the last line of the loop holds the individual
elements of the split, but you throw that away when the loop goes back
to the top. You need to save the result somewhere so you can process it
after the loop completes.

For this specific example you could just indent the

count = count + 1
print len(line)

lines inside the loop. But that won't be enough to get you to your
final output of the email addresses.

> Here is the desired output:
> ***@uct.ac.za
> ***@media.berkeley.edu
> ....

HTH

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Steven D'Aprano
2015-07-29 14:42:11 UTC
Permalink
On Tue, Jul 28, 2015 at 11:33:53PM +0000, ***@gmail.com wrote:
>
> Hi Everyone:
>
> What is the source of the syntax error to the String Attribute?
>
> Go to the following URL links and view a copy of the raw data file code and sample data:

Please don't send people to URLs to view your code. Copy and paste it
into the body of your email.


> 1.) http://tinyurl.com/p2xxxhl

Running the code in the simulator, I get the following error on line 6:

AttributeError: 'str' object has no attribute 'startwith'

You misspelled "startswith" as "startwith" (missing the second "s").


--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
l***@gmail.com
2015-07-30 19:07:33 UTC
Permalink
Sent from Surface





From: ***@gmail.com
Sent: ‎Thursday‎, ‎July‎ ‎30‎, ‎2015 ‎11‎:‎47‎ ‎AM
To: Steven D'Aprano






Hi Steve:




New revision code:




count = 0
fn = raw_input("Enter file name: ")
if len(fn) < 1 : fname = "mbox-short.txt"
for line in fn:
if 'From' in line.split()[0]: count += 1
print "There are %d lines starting with From" % count
print len(line)
fn = open(fname)
print "There were", count, "lines in the file with From as the first word"











Syntax message produced by iPython interperter:




NameError Traceback (most recent call last)
C:\Users\vm\Desktop\apps\docs\Python\assinment_8_5_v_2.py in <module>()
6 print "There are %d lines starting with From" % count
7 print len(line)
----> 8 fn = open(fname)
9 print "There were", count, "lines in the file with From as the first wor
d"




NameError: name 'fname' is not defined




In [16]:










Question:




Why is fname = "mbox-short.txt" not loading the sample data?


Sample data file is located at http://www.pythonlearn.com/code/mbox-short.txt




Regards,

Hal






Sent from Surface





From: Steven D'Aprano
Sent: ‎Wednesday‎, ‎July‎ ‎29‎, ‎2015 ‎7‎:‎42‎ ‎AM
To: ***@gmail.com
Cc: ***@python.org





On Tue, Jul 28, 2015 at 11:33:53PM +0000, ***@gmail.com wrote:
>
> Hi Everyone:
>
> What is the source of the syntax error to the String Attribute?
>
> Go to the following URL links and view a copy of the raw data file code and sample data:

Please don't send people to URLs to view your code. Copy and paste it
into the body of your email.


> 1.) http://tinyurl.com/p2xxxhl

Running the code in the simulator, I get the following error on line 6:

AttributeError: 'str' object has no attribute 'startwith'

You misspelled "startswith" as "startwith" (missing the second "s").


--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.pyth
Mark Lawrence
2015-07-30 22:25:06 UTC
Permalink
On 30/07/2015 20:07, ***@gmail.com wrote:
>

When you post here can you please find a mechanism that gives us more
text than whitespace, thank you.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
l***@gmail.com
2015-07-30 22:34:05 UTC
Permalink
sure






Sent from Surface





From: Mark Lawrence
Sent: ‎Thursday‎, ‎July‎ ‎30‎, ‎2015 ‎3‎:‎25‎ ‎PM
To: ***@python.org






_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/
Mark Lawrence
2015-07-31 00:30:57 UTC
Permalink
On 30/07/2015 23:34, ***@gmail.com wrote:
> sure
>
>
>
>
>
>
> Sent from Surface
>
>
>
>
>
> From: Mark Lawrence
> Sent: ‎Thursday‎, ‎July‎ ‎30‎, ‎2015 ‎3‎:‎25‎ ‎PM
> To: ***@python.org
>
>
>
>
>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>

Could have fooled me :(

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/ma
l***@gmail.com
2015-07-30 22:51:54 UTC
Permalink
Hi Mark,

I’m still confused because line 4 reads: fh=open(fname,'r') # Open a new
file handle, not fn = open(fname)



Therefore, can you write down line by line from error to correction?


Here is the revised code:


fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt" # assign fname


fh=open(fname,'r') # Open a new file handle
for line in fh:
print line
if 'From' in line.split()[0] and '@' in line: sender = line.split()[2]
print sender


Regards,

Hal



Sent from Surface

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/
Alan Gauld
2015-07-31 00:13:35 UTC
Permalink
On 30/07/15 23:51, ***@gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt" # assign fname
> fh=open(fname,'r') # Open a new file handle
> for line in fh:
> print line
> if 'From' in line.split()[0] and '@' in line: sender = line.split()[2]

Note that you are overwriting sender each time through the loop.
Also [2] isa the third element, I think you want the second [1]

BTW Its probably clearer to write that last line as:

if line.startswith('From') and '@' in line:
sender = line.split()[1]

Better still may be to split the line first:

sections = line.split()
if 'FROM' in sections[0].upper() and '@' in sections:
sender = sections[1]

> print sender

And this is outside the loop so will only print the last item.
To print all of them you need to move the print inside the loop.

hth
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
l***@gmail.com
2015-07-30 21:17:49 UTC
Permalink
Hi everyone,


Revised code:




fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt" # assign fname




fh=open(fname,'r') # Open a new file handle
for line in fh:
print line
if 'From' in line.split()[0] and '@' in line: sender = line.split()[1]
fn.seek(0)





print sender




Questions: Why is the loop not repeating, and where should I insert a split to remove 'Sat Jan 5:09:14:16 2008'

From ***@uct.ac.za Sat Jan 5 09:14:16 2008 ← Mismatch






Sent from Surface





From: ***@gmail.com
Sent: ‎Thursday‎, ‎July‎ ‎30‎, ‎2015 ‎12‎:‎07‎ ‎PM
To: ***@python.org












Sent from Surface





From: ***@gmail.com
Sent: ‎Thursday‎, ‎July‎ ‎30‎, ‎2015 ‎11‎:‎47‎ ‎AM
To: Steven D'Aprano






Hi Steve:




New revision code:




count = 0
fn = raw_input("Enter file name: ")
if len(fn) < 1 : fname = "mbox-short.txt"
for line in fn:
if 'From' in line.split()[0]: count += 1
print "There are %d lines starting with From" % count
print len(line)
fn = open(fname)
print "There were", count, "lines in the file with From as the first word"











Syntax message produced by iPython interperter:




NameError Traceback (most recent call last)
C:\Users\vm\Desktop\apps\docs\Python\assinment_8_5_v_2.py in <module>()
6 print "There are %d lines starting with From" % count
7 print len(line)
----> 8 fn = open(fname)
9 print "There were", count, "lines in the file with From as the first wor
d"




NameError: name 'fname' is not defined




In [16]:










Question:




Why is fname = "mbox-short.txt" not loading the sample data?


Sample data file is located at http://www.pythonlearn.com/code/mbox-short.txt




Regards,

Hal






Sent from Surface





From: Steven D'Aprano
Sent: ‎Wednesday‎, ‎July‎ ‎29‎, ‎2015 ‎7‎:‎42‎ ‎AM
To: ***@gmail.com
Cc: ***@python.org





On Tue, Jul 28, 2015 at 11:33:53PM +0000, ***@gmail.com wrote:
>
> Hi Everyone:
>
> What is the source of the syntax error to the String Attribute?
>
> Go to the following URL links and view a copy of the raw data file code and sample data:

Please don't send people to URLs to view your code. Copy and paste it
into the body of your email.


> 1.) http://tinyurl.com/p2xxxhl

Running the code in the simulator, I get the following error on line 6:

AttributeError: 'str' object has no attribute 'startwith'

You misspelled "startswith" as "startwith" (missing the second "s").


--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo
Alan Gauld
2015-07-31 00:04:56 UTC
Permalink
On 30/07/15 22:17, ***@gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt" # assign fname
> fh=open(fname,'r') # Open a new file handle
> for line in fh:
> print line
> if 'From' in line.split()[0] and '@' in line: sender = line.split()[1]
> fn.seek(0)
> print sender
>
> Questions: Why is the loop not repeating,

What makes you think so?
If you get an error(as I suspect) please post the entire error message.

I would expect a name error on the last line of the loop since there is
no variable fn defined.

I don't know what you think the seek() is doing, but (assuming
you meant fh) it will reset the file to the first line each time
so you never finish the loop.

> and where should I insert a split to remove 'Sat Jan 5:09:14:16 2008'
>
> From ***@uct.ac.za Sat Jan 5 09:14:16 2008 ← Mismatch

Splitting on whitespace will ensure the bit you want is
in the second element


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://
l***@gmail.com
2015-07-31 00:25:26 UTC
Permalink
Hi Alan,




I rewrote the code as follows:







fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
if not line.startswith('From'): continue
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
print line4
count = count + 1
print "There were", count, "lines in the file with From as the first word"







Question: How do I remove the duplicates:




***@uct.ac.za
***@uct.ac.za← Mismatch
***@media.berkeley.edu
***@media.berkeley.edu
***@umich.edu
***@umich.edu
***@iupui.edu
***@iupui.edu









Regards,

Hal


Sent from Surface





From: Alan Gauld
Sent: ‎Thursday‎, ‎July‎ ‎30‎, ‎2015 ‎5‎:‎04‎ ‎PM
To: ***@python.org





On 30/07/15 22:17, ***@gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt" # assign fname
> fh=open(fname,'r') # Open a new file handle
> for line in fh:
> print line
> if 'From' in line.split()[0] and '@' in line: sender = line.split()[1]
> fn.seek(0)
> print sender
>
> Questions: Why is the loop not repeating,

What makes you think so?




>>No count = count +1


If you get an error(as I suspect) please post the entire error message.




>>OK

I would expect a name error on the last line of the loop since there is
no variable fn defined.

I don't know what you think the seek() is doing, but (assuming
you meant fh) it will reset the file to the first line each time
so you never finish the loop.




>>OK

> and where should I insert a split to remove 'Sat Jan 5:09:14:16 2008'
>
> From ***@uct.ac.za Sat Jan 5 09:14:16 2008 ← Mismatch

Splitting on whitespace will ensure the bit you want is
in the second element




>>Check the revised code, above


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.o
Alan Gauld
2015-07-31 09:00:50 UTC
Permalink
On 31/07/15 01:25, ***@gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
> if not line.startswith('From'): continue
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> print line4
> count = count + 1
> print "There were", count, "lines in the file with From as the first word"
>
> Question: How do I remove the duplicates:

OK, You now have the original code working, well done.
To remove the duplicates you need to collect the addresses
rather than printing them. Since you want the addresses
to be unique you can use a set.

You do that by first creating an empty set above
the loop, let's call it addresses:

addresses = set()

Then replace your print statement with the set add()
method:

addresses.add(line4)

This means that at the end of your loop you will have
a set containing all of the unique addresses you found.
You now print the set. You can do that directly or for
more control over layout you can write another for
loop that prints each address individually.

print addresses

or

for address in addresses:
print address # plus any formatting you want

You can also sort the addresses by calling the
sorted() function before printing:

print sorted(addresses)


HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
l***@gmail.com
2015-07-31 14:39:46 UTC
Permalink
Hi Alan,





Here is the revised code below:




fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
if not line.startswith('From'): continue
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses = set()
addresses.add(line4)
count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"










The code produces the following out put:




In [15]: %run _8_5_v_13.py
Enter file name: mbox-short.txt
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@media.berkeley.edu'])
set(['***@media.berkeley.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@umich.edu'])
set(['***@umich.edu'])
set(['***@caret.cam.ac.uk'])
set(['***@caret.cam.ac.uk'])
set(['***@gmail.com'])
set(['***@gmail.com'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@uct.ac.za'])
set(['***@media.berkeley.edu'])
set(['***@media.berkeley.edu'])
set(['***@media.berkeley.edu'])
set(['***@media.berkeley.edu'])
set(['***@media.berkeley.edu'])
set(['***@media.berkeley.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
set(['***@iupui.edu'])
There were 54 lines in the file with From as the first word







Question no. 1: is there a build in function for set that parses the data for duplicates.




In [18]: dir (set)
Out[18]:
['__and__',
'__class__',
'__cmp__',
'__contains__',
'__delattr__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__iand__',
'__init__',
'__ior__',
'__isub__',
'__iter__',
'__ixor__',
'__le__',
'__len__',
'__lt__',
'__ne__',
'__new__',
'__or__',
'__rand__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__ror__',
'__rsub__',
'__rxor__',
'__setattr__',
'__sizeof__',
'__str__',
'__sub__',
'__subclasshook__',
'__xor__',
'add',
'clear',
'copy',
'difference',
'difference_update',
'discard',
'intersection',
'intersection_update',
'isdisjoint',
'issubset',
'issuperset',
'pop',
'remove',
'symmetric_difference',
'symmetric_difference_update',
'union',
'update']







Question no. 2: Why is there not a building function for append?







Question no. 3: If all else fails, i.e., append & set, my only option is the slice the data set?




Regards,

Hal






Sent from Surface





From: Alan Gauld
Sent: ‎Friday‎, ‎July‎ ‎31‎, ‎2015 ‎2‎:‎00‎ ‎AM
To: ***@python.org





On 31/07/15 01:25, ***@gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
> if not line.startswith('From'): continue
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> print line4
> count = count + 1
> print "There were", count, "lines in the file with From as the first word"
>
> Question: How do I remove the duplicates:

OK, You now have the original code working, well done.
To remove the duplicates you need to collect the addresses
rather than printing them. Since you want the addresses
to be unique you can use a set.

You do that by first creating an empty set above
the loop, let's call it addresses:

addresses = set()

Then replace your print statement with the set add()
method:

addresses.add(line4)

This means that at the end of your loop you will have
a set containing all of the unique addresses you found.
You now print the set. You can do that directly or for
more control over layout you can write another for
loop that prints each address individually.

print addresses

or

for address in addresses:
print address # plus any formatting you want

You can also sort the addresses by calling the
sorted() function before printing:

print sorted(addresses)


HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.p
Alan Gauld
2015-07-31 16:08:49 UTC
Permalink
On 31/07/15 15:39, ***@gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
> if not line.startswith('From'): continue
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses = set()

Notice I said you had to create and initialize the set
*above* the loop.
Here you are creating a new set every time round the
loop and throwing away the old one.

> addresses.add(line4)
> count = count + 1
> print addresses

And notice I said to move the print statement
to *after* the loop so as to print the complete set,
not just the current status.

> print "There were", count, "lines in the file with From as the first word"
>
> The code produces the following out put:
>
> In [15]: %run _8_5_v_13.py
> Enter file name: mbox-short.txt
> set(['***@uct.ac.za'])
> set(['***@uct.ac.za'])
> set(['***@media.berkeley.edu'])

Thats correct because you create a new set each time
and add precisely one element to it before throwing
it away and starting over next time round.

> Question no. 1: is there a build in function for set that parses the data for duplicates.

No because thats what a set does. it is a collection of
unique items. It will not allow duplicates.

Your problem is you create a new set of one item for
every line. So you have multiple sets with the same
data in them.

> Question no. 2: Why is there not a building function for append?

add() is the equivalent of append for a set.
If you try to add() a value that already exists it
will be ignored.

> Question no. 3: If all else fails, i.e., append & set, my only option is the slice the data set?

No there are lots of other options but none of them are necessary
because a set is a collection of unique values. You just need to
use it properly. Read my instructions again, carefully:

> You do that by first creating an empty set above
> the loop, let's call it addresses:
>
> addresses = set()
>
> Then replace your print statement with the set add()
> method:
>
> addresses.add(line4)
>
> This means that at the end of your loop you will have
> a set containing all of the unique addresses you found.
> You now print the set.



--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Martin A. Brown
2015-07-31 16:18:26 UTC
Permalink
Greetings again Hal,

Thank you for posting your small amounts of code and results inline.
Thanks for also including clear questions. Your "surface" still
seems to add extra space, so, if you could trim that, you may get
even more responses from others who are on the Tutor mailing list.

Now, on to your question.

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
> if not line.startswith('From'): continue
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses = set()
> addresses.add(line4)
> count = count + 1
> print addresses
> print "There were", count, "lines in the file with From as the first word"

> The code produces the following out put:
>
> In [15]: %run _8_5_v_13.py
> Enter file name: mbox-short.txt
> set(['***@uct.ac.za'])

[ ... snip ... ]

> set(['***@iupui.edu'])
>
> Question no. 1: is there a build in function for set that parses
> the data for duplicates.

The problem is not with the data structure called set().

Your program is not bad at all.

I would suggest making two small changes to it.

I think I have seen a pattern in the samples of code you have been
sending--this pattern is that you reuse the same variable inside a
loop, and do not understand why you are not collecting (or
accumulating) all of the results.

Here's your program. I have moved two lines. The idea here is to initialize
the 'addresses' variable before the loop begins (exactly like you do with the
'count' variable). Then, after the loop completes (and, you have processed
all of your input and accumulated all of the desired data), you can also print
out the contents of the set variable called 'addresses'.

Try this out:

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if not line.startswith('From'): continue
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print "There were", count, "lines in the file with From as the first word"
print addresses


> Question no. 2: Why is there not a building function for append?

> Question no. 3: If all else fails, i.e., append & set, my only
> option is the slice the data set?

I do not understand these two questions.

Good luck.

-Martin

P.S. By the way, Alan Gauld has also responded to your message, with
a differently-phrased answer, but, fundamentally, he and I are
saying the same thing. Think about where you are initializing
your variables, and know that 'addresses = set()' in the middle
of the code is re-initializing the variable and throwing away
anything that was there before..

--
Martin A. Brown
http://linux-ip.net/
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
l***@gmail.com
2015-07-31 18:57:56 UTC
Permalink
Hi Martin,




Hal is not have a great day, indeed to day:



Here is the raw data entered:


fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
line2 = line.strip()
line3 = line2.split()
line4 = line3[0]
addresses.add(line4)
count = count + 1
print "There were", count, "lines in the file with From as the first word"
print addresses





→Question: Why is the list index out of range on line # 9:




IndexError




Traceback (most recent call last)
C:\Users\vm\Desktop\apps\docs\Python\assinment_8_5_v_20.py in <module>()
7 line2 = line.strip()
8 line3 = line2.split()
----> 9 line4 = line3[1]
10 addresses.add(line4)
11 count = count + 1




IndexError: list index out of range






→I entered different index ranges from [] to [5] that, later, produced the same Index Error message:


IndexError: list index out of range


In [34]: print line3[]
File "<ipython-input-34-7bf39294000a>", line 1
print line3[]
^
SyntaxError: invalid syntax



In [35]: print line[1]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-35-3ba0fe1b7bd4> in <module>()
----> 1 print line[1]


IndexError: string index out of range


In [36]: print line[2]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-36-6088e93feeeb> in <module>()
----> 1 print line[2]


IndexError: string index out of range


In [37]: print line[3]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-37-127d944ba1b7> in <module>()
----> 1 print line[3]


IndexError: string index out of range


In [38]: print line[4]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-38-5c497e1246ea> in <module>()
----> 1 print line[4]


IndexError: string index out of range


In [39]: print line[5]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-39-3a91a0cf6bd2> in <module>()
----> 1 print line[5]


IndexError: string index out of range


→Question: I think the problem is in the placement of the address set: The addresses = set()?


Regards,

Hal








Sent from Surface





From: Martin A. Brown
Sent: ‎Friday‎, ‎July‎ ‎31‎, ‎2015 ‎9‎:‎18‎ ‎AM
To: ***@gmail.com
Cc: ***@python.org






Greetings again Hal,

Thank you for posting your small amounts of code and results inline.
Thanks for also including clear questions. Your "surface" still
seems to add extra space, so, if you could trim that, you may get
even more responses from others who are on the Tutor mailing list.

Now, on to your question.

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
> if not line.startswith('From'): continue
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses = set()
> addresses.add(line4)
> count = count + 1
> print addresses
> print "There were", count, "lines in the file with From as the first word"

> The code produces the following out put:
>
> In [15]: %run _8_5_v_13.py
> Enter file name: mbox-short.txt
> set(['***@uct.ac.za'])

[ ... snip ... ]

> set(['***@iupui.edu'])
>
> Question no. 1: is there a build in function for set that parses
> the data for duplicates.

The problem is not with the data structure called set().

Your program is not bad at all.

I would suggest making two small changes to it.

I think I have seen a pattern in the samples of code you have been
sending--this pattern is that you reuse the same variable inside a
loop, and do not understand why you are not collecting (or
accumulating) all of the results.

Here's your program. I have moved two lines. The idea here is to initialize
the 'addresses' variable before the loop begins (exactly like you do with the
'count' variable). Then, after the loop completes (and, you have processed
all of your input and accumulated all of the desired data), you can also print
out the contents of the set variable called 'addresses'.

Try this out:

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if not line.startswith('From'): continue
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print "There were", count, "lines in the file with From as the first word"
print addresses


> Question no. 2: Why is there not a building function for append?


>> Alan answered the question, thanks

> Question no. 3: If all else fails, i.e., append & set, my only
> option is the slice the data set?

I do not understand these two questions.


>> Alan answered the question thanks

Good luck.

-Martin

P.S. By the way, Alan Gauld has also responded to your message, with
a differently-phrased answer, but, fundamentally, he and I are
saying the same thing. Think about where you are initializing
your variables, and know that 'addresses = set()' in the middle
of the code is re-initializing the variable and throwing away
anything that was there before..

--
Martin A. Brown
http://linux-ip.net/
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tuto
Emile van Sebille
2015-07-31 23:16:36 UTC
Permalink
On 7/31/2015 11:57 AM, ***@gmail.com wrote:

> →Question: Why is the list index out of range on line # 9:
>
> IndexError
>
> Traceback (most recent call last)
> C:\Users\vm\Desktop\apps\docs\Python\assinment_8_5_v_20.py in <module>()
> 7 line2 = line.strip()
> 8 line3 = line2.split()
> ----> 9 line4 = line3[1]
> 10 addresses.add(line4)
> 11 count = count + 1
> IndexError: list index out of range
>

Because line3 is not sub-scriptable.

Have you examined what line3 holds when the error occurs?

Emile



_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org
l***@gmail.com
2015-07-31 23:25:11 UTC
Permalink
Emile,


--> Captured is a printout from line3 to addresses, below:



In [46]: print line3
[]


In [47]: print line2.split()
[]


In [48]: print line2


In [49]: print line.strip()


In [50]: print fh
<open file 'mbox-short.txt', mode 'r' at 0x00000000035CB0C0>


In [51]: print addresses
set(['1.0', '***@collab.sakaiproject.org;', 'Jan', 'mail.umich.edu', 'Innocen
t', '0.0000', 'CMU', 'frankenstein.mail.umich.edu', '0.8475', 'from', '***@co
llab.sakaiproject.org', '05', '<***@nakamura.uits.iupui.
edu>', 'flawless.mail.umich.edu', '5', 'nakamura.uits.iupui.edu:', 'shmi.uhi.ac.
uk', '7bit', 'text/plain;', '<***@collab.sakaiproject.org>;', 'Sat,', 'nakamu
ra.uits.iupui.edu', 'paploo.uhi.ac.uk', 'FROM', 'holes.mr.itd.umich.edu', '(from
', '<***@collab.sakaiproject.org>', '[sakai]', '***@uct.ac.z
a', 'Sat'])


In [52]:



→Original Traceback error message:



IndexError

Traceback (most recent call last)


C:\Users\vm\Desktop\apps\docs\Python\_8_5_v_21.py in <module>()
7 line2 = line.strip()
8 line3 = line2.split()
----> 9 line4 = line3[1]
10 addresses.add(line4)
11 count = count + 1




IndexError: list index out of range



→ Latest code printout:


fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print "There were", count, "lines in the file with From as the first word"
print addresses



→ I swapped line statements between lines 4 & 5. No change on the Traceback error message:


IndexError

Traceback (most recent call last)
C:\Users\vm\Desktop\apps\docs\Python\_8_5_v_21.py in <module>()
8 line2 = line.strip()
9 line3 = line2.split()
---> 10 line4 = line3[1]
11 addresses.add(line4)
12 count = count + 1


IndexError: list index out of range


In [54]: print line3


→Question: The file data content is lost on execution of the sort function?


Regards,

Hal






Sent from Surface





From: Emile van Sebille
Sent: ‎Friday‎, ‎July‎ ‎31‎, ‎2015 ‎4‎:‎16‎ ‎PM
To: ***@python.org





On 7/31/2015 11:57 AM, ***@gmail.com wrote:

> →Question: Why is the list index out of range on line # 9:
>
> IndexError
>
> Traceback (most recent call last)
> C:\Users\vm\Desktop\apps\docs\Python\assinment_8_5_v_20.py in <module>()
> 7 line2 = line.strip()
> 8 line3 = line2.split()
> ----> 9 line4 = line3[1]
> 10 addresses.add(line4)
> 11 count = count + 1
> IndexError: list index out of range
>

Because line3 is not sub-scriptable.

Have you examined what line3 holds when the error occurs?

Emile



_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo
Alan Gauld
2015-07-31 23:54:51 UTC
Permalink
On 31/07/15 19:57, ***@gmail.com wrote:

> for line in fh:
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[0]

You need to check that there actually is something
in the list to access. If you get a line with only
one word in it, or even a blank line this will fail.


> addresses.add(line4)
> count = count + 1
> print "There were", count, "lines in the file with From as the first word"

Despite what you print you don't know that its true anymore.
You have removed the code that tested for the first
word being "From". You should put that check back in your code.

> →I entered different index ranges from [] to [5]

I'm not sure what [] means in this case? It should be a syntax error
as you show below.

> In [34]: print line3[]
> File "<ipython-input-34-7bf39294000a>", line 1
> print line3[]
> ^
> SyntaxError: invalid syntax

See, that's not an IndexError. They are different and have different
causes. A syntax error means your code is not valid Python. An
IndexError means the code is valid but its trying to access
something that doesn't exist.

> →Question: I think the problem is in the placement of the address set: The addresses = set()?

No it has nothing to do with that. The set is not
involved in this operation at this point.

To debug these kinds of errors insert a print statement
above the error line. In this case:

print line3

That will show you what the data looks like and you can tell
whether line3[1] makes any kind of sense.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman
l***@gmail.com
2015-07-31 23:59:04 UTC
Permalink
Sent from Surface





From: Alan Gauld
Sent: ‎Friday‎, ‎July‎ ‎31‎, ‎2015 ‎4‎:‎54‎ ‎PM
To: ***@python.org





On 31/07/15 19:57, ***@gmail.com wrote:

> for line in fh:
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[0]

You need to check that there actually is something
in the list to access. If you get a line with only
one word in it, or even a blank line this will fail.


→→Apparently, the data content in the file is lost from the address sort function to line2? :






In [46]: print line3
[]




In [47]: print line2.split()
[]




In [48]: print line2


In [49]: print line.strip()


In [50]: print fh
<open file 'mbox-short.txt', mode 'r' at 0x00000000035CB0C0>




In [51]: print addresses
set(['1.0', '***@collab.sakaiproject.org;', 'Jan', 'mail.umich.edu', 'Innocen
t', '0.0000', 'CMU', 'frankenstein.mail.umich.edu', '0.8475', 'from', '***@co
llab.sakaiproject.org', '05', '<***@nakamura.uits.iupui.
edu>', 'flawless.mail.umich.edu', '5', 'nakamura.uits.iupui.edu:', 'shmi.uhi.ac.
uk', '7bit', 'text/plain;', '<***@collab.sakaiproject.org>;', 'Sat,', 'nakamu
ra.uits.iupui.edu', 'paploo.uhi.ac.uk', 'FROM', 'holes.mr.itd.umich.edu', '(from
', '<***@collab.sakaiproject.org>', '[sakai]', '***@uct.ac.z
a', 'Sat'])




In [52]:




→ Latest code printout:




fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print "There were", count, "lines in the file with From as the first word"
print addresses



> addresses.add(line4)
> count = count + 1
> print "There were", count, "lines in the file with From as the first word"

Despite what you print you don't know that its true anymore.
You have removed the code that tested for the first
word being "From". You should put that check back in your code.

> →I entered different index ranges from [] to [5]

I'm not sure what [] means in this case? It should be a syntax error
as you show below.

> In [34]: print line3[]
> File "<ipython-input-34-7bf39294000a>", line 1
> print line3[]
> ^
> SyntaxError: invalid syntax


→→ OK

See, that's not an IndexError. They are different and have different
causes. A syntax error means your code is not valid Python. An
IndexError means the code is valid but its trying to access
something that doesn't exist.


→→ OK



→Question: I think the problem is in the placement of the address set: The addresses = set()?

No it has nothing to do with that. The set is not
involved in this operation at this point.

To debug these kinds of errors insert a print statement
above the error line. In this case:

print line3



→→ Read printout above


That will show you what the data looks like and you can tell
whether line3[1] makes any kind of sense.




→→id.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription
Alan Gauld
2015-08-01 08:14:27 UTC
Permalink
On 01/08/15 00:59, ***@gmail.com wrote:

>> for line in fh:
>> line2 = line.strip()
>> line3 = line2.split()
>> line4 = line3[0]
>
> →→Apparently, the data content in the file is lost from the address sort function to line2? :

It is not lost, it is an empty line.

> In [47]: print line2.split()
> []

split has returned no content.
The line must have been empty (or full of whitespace
which strip() removed).

> In [48]: print line2
> In [49]: print line.strip()

Again it shows an empty line.

> In [51]: print addresses
> set(['1.0', '***@collab.sakaiproject.org;', 'Jan', 'mail.umich.edu', 'Innocen
> t', '0.0000', 'CMU', 'frankenstein.mail.umich.edu', '0.8475', 'from', '***@co
> llab.sakaiproject.org', '05', '<***@nakamura.uits.iupui.
> edu>', 'flawless.mail.umich.edu', '5', 'nakamura.uits.iupui.edu:', 'shmi.uhi.ac.
> uk', '7bit', 'text/plain;', '<***@collab.sakaiproject.org>;', 'Sat,', 'nakamu
> ra.uits.iupui.edu', 'paploo.uhi.ac.uk', 'FROM', 'holes.mr.itd.umich.edu', '(from
> ', '<***@collab.sakaiproject.org>', '[sakai]', '***@uct.ac.z
> a', 'Sat'])

But this is odd since it shows the set containing the full line which
suggests you maybe did an add(line) instead of add(line4) at some point?

> You have removed the code that tested for the first
> word being "From". You should put that check back in your code.

If you do this it should fix the IndexError problem too,
since empty lines will not start with From

ie your loop should look like

for line in fh:
if line.startswith('From'):
# the loop body as it currently is


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/l
Ltc Hotspot
2015-08-01 18:48:02 UTC
Permalink
Hi Alan,

There is an indent message in the revised code.
Question: Where should I indent the code line for the loop?

View the revised codes with loop indents, below:

--->Revised Code v.2 wo/indent from lines 8-12:

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if line.startswith('From'):
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line)
count = count + 1
print "There were", count, "lines in the file with From as the first word"
print addresses


---> Message output reads:
In [62]: %run _8_5_v_25.py
File "C:\Users\vm\Desktop\apps\docs\Python\_8_5_v_25.py", line 8
line2 = line.strip()
^
IndentationError: expected an indented block


--->Revised Code v.3 w/indent from lines 8-12:

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if line.startswith('From'):
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line)
count = count + 1
print "There were", count, "lines in the file with From as the first word"
print addresses

---> Message output reads:

...pi/component/src/java/org/sakaiproject/component/util/RecordWriter.java\n',
'Dat
e: 2008-01-04 11:09:12 -0500 (Fri, 04 Jan 2008)\n', '\t 4 Jan 2008 11:12:30
-050
0\n', '\tby nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit) id
m03M5Ea
7005273\n', 'New Revision: 39755\n', 'X-DSPAM-Processed: Thu Jan 3
16:23:48 200
8\n', 'Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39754\n',
'
\t Fri, 04 Jan 2008 11:35:08 -0500\n', '\tfor <
***@collab.sakaiproject.org>;
Fri, 4 Jan 2008 04:05:54 -0500\n', 'Received: from carrie.mr.itd.umich.edu
(carr
ie.mr.itd.umich.edu [141.211.93.152])\n', 'Message-ID:
<200801042044.m04Kiem3007
***@nakamura.uits.iupui.edu>\n', '\tfor <***@collab.sakaiproject.org>;
Fri,
4 Jan 2008 16:36:37 +0000 (GMT)\n', '\t Fri, 04 Jan 2008 15:03:18 -0500\n',
'\tF
ri, 4 Jan 2008 16:11:31 +0000 (GMT)\n', ' by paploo.uhi.ac.uk
(JAMES S
MTP Server 2.1.3) with SMTP ID 960\n', 'From ***@media.berkeley.edu Fri
Jan 4
18:10:48 2008\n', ' Thu, 3 Jan 2008 22:06:34 +0000 (GMT)\n',
'\tfor so
***@collab.sakaiproject.org; Fri, 4 Jan 2008 10:15:57 -0500\n', 'Received:
from
eyewitness.mr.itd.umich.edu (eyewitness.mr.itd.umich.edu
[141.211.93.142])\n',
'Subject: [sakai] svn commit: r39743 -
gradebook/branches/oncourse_2-4-2/app/ui/
src/java/org/sakaiproject/tool/gradebook/ui\n', 'Date: 2008-01-04 10:15:54
-0500
(Fri, 04 Jan 2008)\n', 'New Revision: 39761\n', '\tBY
salemslot.mr.itd.umich.ed
u ID 477DF74E.49493.30415 ; \n', 'X-DSPAM-Processed: Sat Jan 5 09:14:16
2008\n'
, '\tfor <***@collab.sakaiproject.org>; Fri, 4 Jan 2008 21:10:14 +0000
(GMT)
\n', '\tby paploo.uhi.ac.uk (Postfix) with ESMTP id 88598BA5B6;\n',
'X-DSPAM-Pro
cessed: Fri Jan 4 04:07:34 2008\n', 'r39558 | ***@iupui.edu | 2007-12-20
15:25:
38 -0500 (Thu, 20 Dec 2007) | 3 lines\n', 'From ***@umich.edu Fri Jan
4 11:
10:22 2008\n', '\tby nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11)
with ESM
TP id m04N8vHG008127\n', '\tSat, 5 Jan 2008 14:10:05 +0000 (GMT)\n', '\tby
naka
mura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit) id m049W2i5006493\n',
'\tT
hu, 3 Jan 2008 22:06:57 +0000 (GMT)\n', ' Fri, 4 Jan 2008
19:46:50 +00
00 (GMT)\n', 'Message-ID: <
***@nakamura.uits.iupui.edu>\
n', 'Subject: [sakai] svn commit: r39756 - in
component/branches/SAK-12166/compo
nent-api/component/src/java/org/sakaiproject/component: impl
impl/spring/support
impl/spring/support/dynamic impl/support util\n',
'site/trunk/site-tool/tool/sr
c/bundle/admin.properties\n', 'Author: ***@gmail.com\n',
'From d
***@uct.ac.za Fri Jan 4 04:33:44 2008\n', '\tby
nakamura.uits.iupui.ed
u (8.12.11.20060308/8.12.11) with ESMTP id m04E3pQS006928\n', '\tfor
***@coll
ab.sakaiproject.org; Fri, 4 Jan 2008 16:09:02 -0500\n', 'X-DSPAM-Processed:
Fri
Jan 4 09:05:31 2008\n', '\t 4 Jan 2008 16:10:33 -0500\n', '\tfor
***@collab.
sakaiproject.org; Fri, 4 Jan 2008 11:09:14 -0500\n', 'merge fix to SAK-9996
into
2-5-x branch: svn merge -r 39687:39688
https://source.sakaiproject.org/svn/site
-manage/trunk/\n', 'Subject: [sakai] svn commit: r39751 - in
podcasts/branches/s
akai_2-5-x/podcasts-app/src/webapp: css images podcasts\n', 'Subject:
[sakai] sv
n commit: r39757 - in assignment/trunk:
assignment-impl/impl/src/java/org/sakaip
roject/assignment/impl assignment-tool/tool/src/webapp/vm/assignment\n',
'From w
***@iupui.edu Fri Jan 4 10:38:42 2008\n', 'Date: 2008-01-03 17:16:39
-0500
(Thu, 03 Jan 2008)\n', ' by paploo.uhi.ac.uk (JAMES SMTP Server
2.1.3)
with SMTP ID 906\n', 'U
podcasts/podcasts-app/src/webapp/podcasts/podOptions.
jsp\n', 'svn merge -c 35014
https://source.sakaiproject.org/svn/gradebook/trunk\
n', 'Received: from galaxyquest.mr.itd.umich.edu (
galaxyquest.mr.itd.umich.edu [
141.211.93.145])\n', '\tBY salemslot.mr.itd.umich.edu ID
477D5F23.797F6.16348 ;
\n', 'Date: Fri, 4 Jan 2008 18:08:57 -0500\n', 'X-DSPAM-Processed: Fri Jan
4 04
:33:44 2008\n',
'polls/trunk/tool/src/java/org/sakaiproject/poll/tool/evolvers/\
n', 'Date: 2008-01-04 10:01:40 -0500 (Fri, 04 Jan 2008)\n',
'X-DSPAM-Confidence:
0.8475\n', '\tFri, 4 Jan 2008 09:48:55 +0000 (GMT)\n',
'X-DSPAM-Processed: Fri
Jan 4 06:08:27 2008\n', '\tBY anniehall.mr.itd.umich.edu ID
477D5C7A.4FE1F.222
11 ; \n', '\t Fri, 04 Jan 2008 11:10:22 -0500\n', '\tBY
workinggirl.mr.itd.umich
.edu ID 477DFD6C.75DBE.26054 ; \n', 'svn log -r 39403
https://source.sakaiprojec
t.org/svn/gradebook/trunk\n', '\t Thu, 03 Jan 2008 19:51:21 -0500\n',
'Date: Fri
, 4 Jan 2008 11:08:39 -0500\n', '\tby shmi.uhi.ac.uk (Postfix) with ESMTP
id C59
6A3DFA2\n', '\tby shmi.uhi.ac.uk (Postfix) with ESMTP id 8889842C49\n',
'X-DSPAM
-Processed: Fri Jan 4 11:12:37 2008\n', 'Details:
http://source.sakaiproject.or
g/viewsvn/?view=rev&rev=39772\n', '\tby shmi.uhi.ac.uk (Postfix) with ESMTP
id 8
C13342C92\n', 'Date: Fri, 4 Jan 2008 11:09:14 -0500\n', '\tFri, 4 Jan 2008
10:17
:42 -0500\n', 'New Revision: 39754\n', 'New Revision: 39749\n', 'Details:
http:/
/source.sakaiproject.org/viewsvn/?view=rev&rev=39755\n', 'svn merge -c
39403 htt
ps://source.sakaiproject.org/svn/gradebook/trunk\n', ' by
paploo.uhi.ac
.uk (JAMES SMTP Server 2.1.3) with SMTP ID 385\n',
'site-manage/branches/sakai_2
-4-x/site-manage-tool/tool/src/java/org/sakaiproject/site/tool/SiteAction.java\n
', 'X-DSPAM-Confidence: 0.6178\n', '\t Fri, 04 Jan 2008 11:12:37 -0500\n',
'Deta
ils: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39765\n',
'Content-Typ
e: text/plain; charset=UTF-8\n', ' Fri, 4 Jan 2008 09:07:04 +0000
(GMT)
\n', 'Date: Fri, 4 Jan 2008 09:03:51 -0500\n', 'From ***@iupui.edu Fri
Jan 4 1
1:35:08 2008\n', '\tby nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11)
with E
SMTP id m049lU3P006519\n', 'r35014 | ***@iupui.edu | 2007-09-12
16:17:59 -0
400 (Wed, 12 Sep 2007) | 3 lines\n',
'component/branches/SAK-12166/component-api
/component/src/java/org/sakaiproject/component/impl/support/DynamicComponentReco
rd.java\n',
'sam/branches/SAK-12065/samigo-app/src/java/org/sakaiproject/tool/as
sessment/ui/bean/evaluation/QuestionScoresBean.java\n', '\tby
paploo.uhi.ac.uk (
Postfix) with ESMTP id DEC65ADC79;\n', '\tfor <
***@collab.sakaiproject.org>;
Fri, 4 Jan 2008 00:25:00 +0000 (GMT)\n', "Sakai Source Repository
\t#38024 \t
Wed Nov 07 14:54:46 MST 2007 \***@umich.edu \t Fix to SAK-10788: If a
provi
ded id in a couse site is fake or doesn't provide any user information,
Site Inf
o appears to be like project site with empty participant list\n", 'Watch
for enr
ollments object being null and concatenate provider ids when there are more
than
one.\n',
'gradebook/trunk/app/ui/src/java/org/sakaiproject/tool/gradebook/ui/he
lpers/params/GradeGradebookItemViewParams.java\n', 'Date: Fri, 4 Jan 2008
16:09:
02 -0500\n', '\tfor <***@collab.sakaiproject.org>; Thu, 3 Jan 2008
21:28:38
+0000 (GMT)\n', 'Details:
http://source.sakaiproject.org/viewsvn/?view=rev&rev=3
9742\n', '\tFri, 4 Jan 2008 14:50:17 -0500\n', 'Date: Fri, 4 Jan 2008
04:05:53 -
0500\n', '\tby paploo.uhi.ac.uk (Postfix) with ESMTP id 6A39594CD2;\n',
'From da
***@uct.ac.za Fri Jan 4 06:08:27 2008\n', 'Subject: [sakai] svn
commit:
r39750 -
event/branches/SAK-6216/event-util/util/src/java/org/sakaiproject/util
\n', 'SAK-9882: refactored the other pages as well to take advantage of
proper j
sp components as well as validation cleanup.\n', '\tfor
<***@collab.sakaiproj
ect.org>; Fri, 4 Jan 2008 11:33:06 -0500\n', 'Received: from
shining.mr.itd.umic
h.edu (shining.mr.itd.umich.edu [141.211.93.153])\n', 'Message-ID:
<200801042109
***@nakamura.uits.iupui.edu>\n', 'Return-Path: <
***@collab.sa
kaiproject.org>\n', 'polls/branches/sakai_2-5-x/.classpath\n',
'X-DSPAM-Processe
d: Fri Jan 4 11:37:30 2008\n', '\tFri, 4 Jan 2008 06:08:26 -0500\n', '\tby
shmi
.uhi.ac.uk (Postfix) with ESMTP id 7D13042F71\n', ' Fri, 4 Jan
2008 14:
05:04 +0000 (GMT)\n', 'X-Authentication-Warning: nakamura.uits.iupui.edu:
apache
set sender to ***@caret.cam.ac.uk using -f\n',
'gradebook/trunk/service/ap
i/src/java/org/sakaiproject/service/gradebook/shared/GradebookService.java\n',
'
Subject: [sakai] svn commit: r39753 - in polls/trunk: . tool
tool/src/java/org/s
akaiproject/poll/tool tool/src/java/org/sakaiproject/poll/tool/evolvers
tool/src
/webapp/WEB-INF\n', '\tby nakamura.uits.iupui.edu
(8.12.11.20060308/8.12.11) wit
h ESMTP id m04F21hn007033\n', '\tBY galaxyquest.mr.itd.umich.edu ID
477D5397.E16
1D.20326 ; \n', ' ...

Regards,
Hal

On Sat, Aug 1, 2015 at 1:14 AM, Alan Gauld <***@btinternet.com>
wrote:

> On 01/08/15 00:59, ***@gmail.com wrote:
>
> for line in fh:
>>> line2 = line.strip()
>>> line3 = line2.split()
>>> line4 = line3[0]
>>>
>>
>> →→Apparently, the data content in the file is lost from the address
>> sort function to line2? :
>>
>
> It is not lost, it is an empty line.
>
> In [47]: print line2.split()
>> []
>>
>
> split has returned no content.
> The line must have been empty (or full of whitespace
> which strip() removed).
>
> In [48]: print line2
>> In [49]: print line.strip()
>>
>
> Again it shows an empty line.
>
> In [51]: print addresses
>> set(['1.0', '***@collab.sakaiproject.org;', 'Jan', 'mail.umich.edu',
>> 'Innocen
>> t', '0.0000', 'CMU', 'frankenstein.mail.umich.edu', '0.8475', 'from',
>> '***@co
>> llab.sakaiproject.org', '05',
>> '<***@nakamura.uits.iupui.
>> edu>', 'flawless.mail.umich.edu', '5', 'nakamura.uits.iupui.edu:', '
>> shmi.uhi.ac.
>> uk', '7bit', 'text/plain;', '<***@collab.sakaiproject.org>;', 'Sat,',
>> 'nakamu
>> ra.uits.iupui.edu', 'paploo.uhi.ac.uk', 'FROM', 'holes.mr.itd.umich.edu',
>> '(from
>> ', '<***@collab.sakaiproject.org>', '[sakai]',
>> '***@uct.ac.z
>> a', 'Sat'])
>>
>
> But this is odd since it shows the set containing the full line which
> suggests you maybe did an add(line) instead of add(line4) at some point?
>
> You have removed the code that tested for the first
>> word being "From". You should put that check back in your code.
>>
>
> If you do this it should fix the IndexError problem too,
> since empty lines will not start with From
>
> ie your loop should look like
>
> for line in fh:
> if line.startswith('From'):
> # the loop body as it currently is
>
>
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/li
Alan Gauld
2015-08-01 20:40:36 UTC
Permalink
On 01/08/15 19:48, Ltc Hotspot wrote:
> There is an indent message in the revised code.
> Question: Where should I indent the code line for the loop?

Do you understand the role of indentation in Python?
Everything in the indented block is part of the structure,
so you need to indent everything that should be executed
as part of the logical block.

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> addresses = set()
> for line in fh:
> if line.startswith('From'):
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses.add(line)
> count = count + 1

Everything after the if line should be indented an extra level
because you only want to do those things if the line
startswith From.

And note that, as I suspected, you are adding the whole line
to the set when you should only be adding the address.
(ie line4). This would be more obvious if you had
used meaningful variable names such as:

strippedLine = line.strip()
tokens = strippedLine.split()
addr = tokens[1]
addresses.add(addr)

PS.
Could you please delete the extra lines from your messages.
Some people pay by the byte and don't want to receive kilobytes
of stuff they have already seen multiple times.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Ltc Hotspot
2015-08-01 23:07:56 UTC
Permalink
Hi Alan,

Question1: The output result is an address or line?
Question2: Why are there 54 lines as compared to 27 line in the desired
output?

Here is the latest revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if line.startswith('From'):
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"

The output result:
set(['***@uct.ac.za', '***@media.berkeley.edu', '
***@umich.edu', '***@iupui.edu', '***@iupui.edu', '***@umich.edu',
'***@iupui.edu', '***@caret.cam.ac.uk', '
***@gmail.com', '***@uct.ac.za', '
***@media.berkeley.edu']) ← Mismatch
There were 54 lines in the file with From as the first word


The desired output result:
***@uct.ac.za
***@media.berkeley.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@caret.cam.ac.uk
***@gmail.com
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@media.berkeley.edu
***@media.berkeley.edu
***@media.berkeley.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
There were 27 lines in the file with From as the first word

Regards,
Hal









On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <***@btinternet.com>
wrote:

> On 01/08/15 19:48, Ltc Hotspot wrote:
>
>> There is an indent message in the revised code.
>> Question: Where should I indent the code line for the loop?
>>
>
> Do you understand the role of indentation in Python?
> Everything in the indented block is part of the structure,
> so you need to indent everything that should be executed
> as part of the logical block.
>
> fname = raw_input("Enter file name: ")
>> if len(fname) < 1 : fname = "mbox-short.txt"
>> fh = open(fname)
>> count = 0
>> addresses = set()
>> for line in fh:
>> if line.startswith('From'):
>> line2 = line.strip()
>> line3 = line2.split()
>> line4 = line3[1]
>> addresses.add(line)
>> count = count + 1
>>
>
> Everything after the if line should be indented an extra level
> because you only want to do those things if the line
> startswith From.
>
> And note that, as I suspected, you are adding the whole line
> to the set when you should only be adding the address.
> (ie line4). This would be more obvious if you had
> used meaningful variable names such as:
>
> strippedLine = line.strip()
> tokens = strippedLine.split()
> addr = tokens[1]
> addresses.add(addr)
>
> PS.
> Could you please delete the extra lines from your messages.
> Some people pay by the byte and don't want to receive kilobytes
> of stuff they have already seen multiple times.
>
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
ht
Alan Gauld
2015-08-02 00:44:57 UTC
Permalink
On 02/08/15 00:07, Ltc Hotspot wrote:
> Question1: The output result is an address or line?

Its your assignment,. you tell me.
But from your previous mails I'm assuming you want addresses?

> Question2: Why are there 54 lines as compared to 27 line in the desired
> output?

Because the set removes duplicates? So presumably there were 27
duplicates? (Which is a suspicious coincidence!)

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> addresses = set()
> for line in fh:
> if line.startswith('From'):
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses.add(line4)
> count = count + 1
> print addresses
> print "There were", count, "lines in the file with From as the first word"

That looks right in that it does what I think you want it to do.

> The output result:
> set(['***@uct.ac.za', '***@media.berkeley.edu', '
> ***@umich.edu', '***@iupui.edu', '***@iupui.edu', '***@umich.edu',
> '***@iupui.edu', '***@caret.cam.ac.uk','
> ***@gmail.com', '***@uct.ac.za', '
> ***@media.berkeley.edu']) ← Mismatch

That is the set of unique addresses, correct?

> There were 54 lines in the file with From as the first word

And that seems to be the number of lines in the original file
starting with From. Can you check manually if that is correct?

> The desired output result:
> ***@uct.ac.za
> ***@media.berkeley.edu
> ***@umich.edu
> ***@iupui.edu
> ***@umich.edu
> ***@iupui.edu
...

Now I'm confused again. This has duplicates but you said you
did not want duplicates? Which is it?

...
> ***@iupui.edu
> ***@iupui.edu
> There were 27 lines in the file with From as the first word

And this is reporting the number of lines in the output
rather than the file (I think). Which do you want?

Its easy enough to change the code to govre the output
you demonstrate, but that's not what you originally asked
for. So just make up your mind exactly what it is you want
out and we can make it work for you.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/list
Ltc Hotspot
2015-08-02 01:20:04 UTC
Permalink
Hi Alan,

I made a mistake and incorrectly assumed that differences between 54 lines
of output and 27 lines of output is the result of removing duplicate email
addresses, i.e., ***@umich.edu
***@umich.edu, ***@iupui.edu, ***@iupui.edu


Apparently, this is not the case and I was wrong :(
The solution to the problem is in the desired line output:

***@uct.ac.za
***@media.berkeley.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@caret.cam.ac.uk
***@gmail.com
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@media.berkeley.edu
***@media.berkeley.edu
***@media.berkeley.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
There were 27 lines in the file with From as the first word
Not in the output of a subset.

Latest output:
set(['***@uct.ac.za', '***@media.berkeley.edu', '
***@umich.edu', '***@iupui.edu', '***@iupui.edu', '***@umich.edu',
'***@iupui.edu', '***@caret.cam.ac.uk', '
***@gmail.com', '***@uct.ac.za', '
***@media.berkeley.edu']) ← Mismatch
There were 54 lines in the file with From as the first word

Latest revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if line.startswith('From'):
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"

Regards,
Hal

On Sat, Aug 1, 2015 at 5:44 PM, Alan Gauld <***@btinternet.com>
wrote:

> On 02/08/15 00:07, Ltc Hotspot wrote:
>
>> Question1: The output result is an address or line?
>>
>
> Its your assignment,. you tell me.
> But from your previous mails I'm assuming you want addresses?
>
> Question2: Why are there 54 lines as compared to 27 line in the desired
>> output?
>>
>
> Because the set removes duplicates? So presumably there were 27
> duplicates? (Which is a suspicious coincidence!)
>
> fname = raw_input("Enter file name: ")
>> if len(fname) < 1 : fname = "mbox-short.txt"
>> fh = open(fname)
>> count = 0
>> addresses = set()
>> for line in fh:
>> if line.startswith('From'):
>> line2 = line.strip()
>> line3 = line2.split()
>> line4 = line3[1]
>> addresses.add(line4)
>> count = count + 1
>> print addresses
>> print "There were", count, "lines in the file with From as the first word"
>>
>
> That looks right in that it does what I think you want it to do.
>
> The output result:
>> set(['***@uct.ac.za', '***@media.berkeley.edu', '
>> ***@umich.edu', '***@iupui.edu', '***@iupui.edu', '
>> ***@umich.edu',
>> '***@iupui.edu', '***@caret.cam.ac.uk','
>> ***@gmail.com', '***@uct.ac.za', '
>> ***@media.berkeley.edu']) ← Mismatch
>>
>
> That is the set of unique addresses, correct?
>
> There were 54 lines in the file with From as the first word
>>
>
> And that seems to be the number of lines in the original file
> starting with From. Can you check manually if that is correct?
>
> The desired output result:
>> ***@uct.ac.za
>> ***@media.berkeley.edu
>> ***@umich.edu
>> ***@iupui.edu
>> ***@umich.edu
>> ***@iupui.edu
>>
> ...
>
> Now I'm confused again. This has duplicates but you said you
> did not want duplicates? Which is it?
>
> ...
>
>> ***@iupui.edu
>> ***@iupui.edu
>> There were 27 lines in the file with From as the first word
>>
>
> And this is reporting the number of lines in the output
> rather than the file (I think). Which do you want?
>
> Its easy enough to change the code to govre the output
> you demonstrate, but that's not what you originally asked
> for. So just make up your mind exactly what it is you want
> out and we can make it work for you.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription optio
Alan Gauld
2015-08-02 08:18:50 UTC
Permalink
On 02/08/15 02:20, Ltc Hotspot wrote:
> Hi Alan,
>
> I made a mistake and incorrectly assumed that differences between 54 lines
> of output and 27 lines of output is the result of removing duplicate email
> addresses,
>
> Apparently, this is not the case and I was wrong :(
> The solution to the problem is in the desired line output:
>
> ***@uct.ac.za
> ***@media.berkeley.edu
> ***@umich.edu
> ***@iupui.edu
> ***@umich.edu
> ***@iupui.edu
...

OK, Only a couple of changes should see to that.

> Latest revised code:
> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> addresses = set()

change this to use a list

addresses = []

> for line in fh:
> if line.startswith('From'):
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses.add(line4)

and change this to use the list append() method

addresses.append(line4)

> count = count + 1
> print addresses
> print "There were", count, "lines in the file with From as the first word"

I'm not quite sure where the 54/27 divergence comes from except that
I noticed Emille mention that there were lines beginning 'From:'
too. If that's the case then follow his advice and change the if
test to only check for 'From ' (with the space).

That should be all you need.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Ltc Hotspot
2015-08-02 09:15:26 UTC
Permalink
Hi Alan,

Question1: Why did the following strip function fail: line2 =
line.strip (',')
View instructions for 'str.strip([*chars*])¶
<https://docs.python.org/2.7/library/stdtypes.html?highlight=strip#str.strip>'
which is available at
https://docs.pythonorg/2.7/library/stdtypes.html?highlight=strip#str.strip

Question2: How do I code a vertical column output

Revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses =[]
for line in fh:
if line.startswith('From'):
line2 = line.strip ()
line3 = line2.split()
line4 = line3[1]
addresses.append(line4)
count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"



Produced output:
['***@uct.ac.za', '***@uct.ac.za', '
***@media.berkeley.edu', '***@media.berkeley.edu', '***@umich.edu', '
***@umich.edu', '***@iupui.edu', '***@iupui.edu', '***@umich.edu',
'***@umich.edu', '***@iupui.edu', '***@iupui.edu', '***@iupui.edu',
'***@iupui.edu', '***@iupui.edu', '***@iupui.edu', '***@umich.edu', '
***@umich.edu', '***@umich.edu', '***@umich.edu', '
***@umich.edu', '***@umich.edu', '***@umich.edu', '***@umich.edu',
'***@iupui.edu', '***@iupui.edu', '***@umich.edu', '
***@umich.edu', '***@caret.cam.ac.uk', '***@caret.cam.ac.uk', '
***@gmail.com', '***@gmail.com', '
***@uct.ac.za', '***@uct.ac.za', '
***@uct.ac.za', '***@uct.ac.za', '
***@uct.ac.za', '***@uct.ac.za', '
***@uct.ac.za', '***@uct.ac.za', '
***@uct.ac.za', '***@uct.ac.za', '
***@media.berkeley.edu', '***@media.berkeley.edu', '
***@media.berkeley.edu', '***@media.berkeley.edu', '
***@media.berkeley.edu', '***@media.berkeley.edu', '***@iupui.edu', '
***@iupui.edu', '***@iupui.edu', '***@iupui.edu', '***@iupui.edu', '
***@iupui.edu'] ← Mismatch
There were 54 lines in the file with From as the first word


Desired output:
***@uct.ac.za
***@media.berkeley.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@caret.cam.ac.uk
***@gmail.com
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@media.berkeley.edu
***@media.berkeley.edu
***@media.berkeley.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
There were 27 lines in the file with From as the first word

Regards,
Hal







On Sun, Aug 2, 2015 at 1:18 AM, Alan Gauld <***@btinternet.com>
wrote:

> On 02/08/15 02:20, Ltc Hotspot wrote:
>
>> Hi Alan,
>>
>> I made a mistake and incorrectly assumed that differences between 54 lines
>> of output and 27 lines of output is the result of removing duplicate email
>> addresses,
>>
>> Apparently, this is not the case and I was wrong :(
>> The solution to the problem is in the desired line output:
>>
>> ***@uct.ac.za
>> ***@media.berkeley.edu
>> ***@umich.edu
>> ***@iupui.edu
>> ***@umich.edu
>> ***@iupui.edu
>>
> ...
>
> OK, Only a couple of changes should see to that.
>
> Latest revised code:
>> fname = raw_input("Enter file name: ")
>> if len(fname) < 1 : fname = "mbox-short.txt"
>> fh = open(fname)
>> count = 0
>> addresses = set()
>>
>
> change this to use a list
>
> addresses = []
>
> for line in fh:
>> if line.startswith('From'):
>> line2 = line.strip()
>> line3 = line2.split()
>> line4 = line3[1]
>> addresses.add(line4)
>>
>
> and change this to use the list append() method
>
> addresses.append(line4)
>
> count = count + 1
>> print addresses
>> print "There were", count, "lines in the file with From as the first word"
>>
>
> I'm not quite sure where the 54/27 divergence comes from except that
> I noticed Emille mention that there were lines beginning 'From:'
> too. If that's the case then follow his advice and change the if
> test to only check for 'From ' (with the space).
>
> That should be all you need.
>
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.pyth
Alan Gauld
2015-08-02 14:06:56 UTC
Permalink
On 02/08/15 10:15, Ltc Hotspot wrote:
> Question1: Why did the following strip function fail: line2 =
> line.strip (',')

What makes you think it failed?
I see no error messages below.

> Question2: How do I code a vertical column output

See below.

> Revised code:
> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> addresses =[]
> for line in fh:
> if line.startswith('From'):

You are still not checking for 'From ' - with a space. Thats why you still
get 54 instead of 27.

> line2 = line.strip ()
> line3 = line2.split()
> line4 = line3[1]
> addresses.append(line4)
> count = count + 1
> print addresses

To get a vertical printout try this:

print '\n'.join(addresses)

Which converts the list into a string with a newline between each element.

Alternatively do it the manual way:

for addr in addresses: print addr


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Emile van Sebille
2015-08-02 00:45:48 UTC
Permalink
On 8/1/2015 4:07 PM, Ltc Hotspot wrote:
> Hi Alan,
>
> Question1: The output result is an address or line?

It's a set actually. Ready to be further processed I imagine. Or to
print out line by line if desired.

> Question2: Why are there 54 lines as compared to 27 line in the desired
> output?

Because there are 54 lines that start with 'From'.

As I noted in looking at your source data, for each email there's a
'From ' and a 'From:' -- you'd get the right answer checking only for
startswith('From ')

Emile



>
> Here is the latest revised code:
> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> addresses = set()
> for line in fh:
> if line.startswith('From'):
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses.add(line4)
> count = count + 1
> print addresses
> print "There were", count, "lines in the file with From as the first word"
>
> The output result:
> set(['***@uct.ac.za', '***@media.berkeley.edu', '
> ***@umich.edu', '***@iupui.edu', '***@iupui.edu', '***@umich.edu',
> '***@iupui.edu', '***@caret.cam.ac.uk','
> ***@gmail.com', '***@uct.ac.za', '
> ***@media.berkeley.edu']) ← Mismatch
> There were 54 lines in the file with From as the first word
>
>
> The desired output result:
> ***@uct.ac.za
> ***@media.berkeley.edu
> ***@umich.edu
> ***@iupui.edu
> ***@umich.edu
> ***@iupui.edu
> ***@iupui.edu
> ***@iupui.edu
> ***@umich.edu
> ***@umich.edu
> ***@umich.edu
> ***@umich.edu
> ***@iupui.edu
> ***@umich.edu
> ***@caret.cam.ac.uk
> ***@gmail.com
> ***@uct.ac.za
> ***@uct.ac.za
> ***@uct.ac.za
> ***@uct.ac.za
> ***@uct.ac.za
> ***@media.berkeley.edu
> ***@media.berkeley.edu
> ***@media.berkeley.edu
> ***@iupui.edu
> ***@iupui.edu
> ***@iupui.edu
> There were 27 lines in the file with From as the first word
>
> Regards,
> Hal
>
>
>
>
>
>
>
>
>
> On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <***@btinternet.com>
> wrote:
>
>> On 01/08/15 19:48, Ltc Hotspot wrote:
>>
>>> There is an indent message in the revised code.
>>> Question: Where should I indent the code line for the loop?
>>>
>>
>> Do you understand the role of indentation in Python?
>> Everything in the indented block is part of the structure,
>> so you need to indent everything that should be executed
>> as part of the logical block.
>>
>> fname = raw_input("Enter file name: ")
>>> if len(fname) < 1 : fname = "mbox-short.txt"
>>> fh = open(fname)
>>> count = 0
>>> addresses = set()
>>> for line in fh:
>>> if line.startswith('From'):
>>> line2 = line.strip()
>>> line3 = line2.split()
>>> line4 = line3[1]
>>> addresses.add(line)
>>> count = count + 1
>>>
>>
>> Everything after the if line should be indented an extra level
>> because you only want to do those things if the line
>> startswith From.
>>
>> And note that, as I suspected, you are adding the whole line
>> to the set when you should only be adding the address.
>> (ie line4). This would be more obvious if you had
>> used meaningful variable names such as:
>>
>> strippedLine = line.strip()
>> tokens = strippedLine.split()
>> addr = tokens[1]
>> addresses.add(addr)
>>
>> PS.
>> Could you please delete the extra lines from your messages.
>> Some people pay by the byte and don't want to receive kilobytes
>> of stuff they have already seen multiple times.
>>
>>
>> --
>> Alan G
>> Author of the Learn to Program web site
>> http://www.alan-g.me.uk/
>> http://www.amazon.com/author/alan_gauld
>> Follow my photo-blog on Flickr at:
>> http://www.flickr.com/photos/alangauldphotos
>>
>>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/list
Ltc Hotspot
2015-08-02 01:20:50 UTC
Permalink
Hi Emile,

I made a mistake and incorrectly assumed that differences between 54 lines
of output and 27 lines of output is the result of removing duplicate email
addresses, i.e., ***@umich.edu
***@umich.edu, ***@iupui.edu, ***@iupui.edu


Apparently, this is not the case and I was wrong :(
The solution to the problem is in the desired line output:

***@uct.ac.za
***@media.berkeley.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@caret.cam.ac.uk
***@gmail.com
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@media.berkeley.edu
***@media.berkeley.edu
***@media.berkeley.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
There were 27 lines in the file with From as the first word
Not in the output of a subset.

Latest output:
set(['***@uct.ac.za', '***@media.berkeley.edu', '
***@umich.edu', '***@iupui.edu', '***@iupui.edu', '***@umich.edu',
'***@iupui.edu', '***@caret.cam.ac.uk', '
***@gmail.com', '***@uct.ac.za', '
***@media.berkeley.edu']) ← Mismatch
There were 54 lines in the file with From as the first word

Latest revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
if line.startswith('From'):
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses.add(line4)
count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"

Regards,
Hal

On Sat, Aug 1, 2015 at 5:45 PM, Emile van Sebille <***@fenx.com> wrote:

> On 8/1/2015 4:07 PM, Ltc Hotspot wrote:
>
>> Hi Alan,
>>
>> Question1: The output result is an address or line?
>>
>
> It's a set actually. Ready to be further processed I imagine. Or to
> print out line by line if desired.
>
> Question2: Why are there 54 lines as compared to 27 line in the desired
>> output?
>>
>
> Because there are 54 lines that start with 'From'.
>
> As I noted in looking at your source data, for each email there's a 'From
> ' and a 'From:' -- you'd get the right answer checking only for
> startswith('From ')
>
> Emile
>
>
>
>
>> Here is the latest revised code:
>> fname = raw_input("Enter file name: ")
>> if len(fname) < 1 : fname = "mbox-short.txt"
>> fh = open(fname)
>> count = 0
>> addresses = set()
>> for line in fh:
>> if line.startswith('From'):
>> line2 = line.strip()
>> line3 = line2.split()
>> line4 = line3[1]
>> addresses.add(line4)
>> count = count + 1
>> print addresses
>> print "There were", count, "lines in the file with From as the first word"
>>
>> The output result:
>> set(['***@uct.ac.za', '***@media.berkeley.edu', '
>> ***@umich.edu', '***@iupui.edu', '***@iupui.edu', '
>> ***@umich.edu',
>> '***@iupui.edu', '***@caret.cam.ac.uk','
>> ***@gmail.com', '***@uct.ac.za', '
>> ***@media.berkeley.edu']) ← Mismatch
>> There were 54 lines in the file with From as the first word
>>
>>
>> The desired output result:
>> ***@uct.ac.za
>> ***@media.berkeley.edu
>> ***@umich.edu
>> ***@iupui.edu
>> ***@umich.edu
>> ***@iupui.edu
>> ***@iupui.edu
>> ***@iupui.edu
>> ***@umich.edu
>> ***@umich.edu
>> ***@umich.edu
>> ***@umich.edu
>> ***@iupui.edu
>> ***@umich.edu
>> ***@caret.cam.ac.uk
>> ***@gmail.com
>> ***@uct.ac.za
>> ***@uct.ac.za
>> ***@uct.ac.za
>> ***@uct.ac.za
>> ***@uct.ac.za
>> ***@media.berkeley.edu
>> ***@media.berkeley.edu
>> ***@media.berkeley.edu
>> ***@iupui.edu
>> ***@iupui.edu
>> ***@iupui.edu
>> There were 27 lines in the file with From as the first word
>>
>> Regards,
>> Hal
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <***@btinternet.com>
>> wrote:
>>
>> On 01/08/15 19:48, Ltc Hotspot wrote:
>>>
>>> There is an indent message in the revised code.
>>>> Question: Where should I indent the code line for the loop?
>>>>
>>>>
>>> Do you understand the role of indentation in Python?
>>> Everything in the indented block is part of the structure,
>>> so you need to indent everything that should be executed
>>> as part of the logical block.
>>>
>>> fname = raw_input("Enter file name: ")
>>>
>>>> if len(fname) < 1 : fname = "mbox-short.txt"
>>>> fh = open(fname)
>>>> count = 0
>>>> addresses = set()
>>>> for line in fh:
>>>> if line.startswith('From'):
>>>> line2 = line.strip()
>>>> line3 = line2.split()
>>>> line4 = line3[1]
>>>> addresses.add(line)
>>>> count = count + 1
>>>>
>>>>
>>> Everything after the if line should be indented an extra level
>>> because you only want to do those things if the line
>>> startswith From.
>>>
>>> And note that, as I suspected, you are adding the whole line
>>> to the set when you should only be adding the address.
>>> (ie line4). This would be more obvious if you had
>>> used meaningful variable names such as:
>>>
>>> strippedLine = line.strip()
>>> tokens = strippedLine.split()
>>> addr = tokens[1]
>>> addresses.add(addr)
>>>
>>> PS.
>>> Could you please delete the extra lines from your messages.
>>> Some people pay by the byte and don't want to receive kilobytes
>>> of stuff they have already seen multiple times.
>>>
>>>
>>> --
>>> Alan G
>>> Author of the Learn to Program web site
>>> http://www.alan-g.me.uk/
>>> http://www.amazon.com/author/alan_gauld
>>> Follow my photo-blog on Flickr at:
>>> http://www.flickr.com/photos/alangauldphotos
>>>
>>>
>>> _______________________________________________
>> Tutor maillist - ***@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://
Mark Lawrence
2015-08-01 00:02:04 UTC
Permalink
On 31/07/2015 19:57, ***@gmail.com wrote:

I believe that this is the third time that you've been asked to do
something about the amount of whitespace that you're sending to this list.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Ltc Hotspot
2015-08-01 00:26:54 UTC
Permalink
Hi Mark,

Desired output on execution of the script:

***@uct.ac.za
***@media.berkeley.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@iupui.edu



[...]

Regards,
Hal

On Fri, Jul 31, 2015 at 5:21 PM, Ltc Hotspot <***@gmail.com> wrote:

> Mark:
> Is this any better, message sent from GMail?
> Regards,
> Hal
>
> On Fri, Jul 31, 2015 at 5:02 PM, Mark Lawrence <***@yahoo.co.uk>
> wrote:
>
>> On 31/07/2015 19:57, ***@gmail.com wrote:
>>
>> I believe that this is the third time that you've been asked to do
>> something about the amount of whitespace that you're sending to this list.
>>
>> --
>> My fellow Pythonistas, ask not what our language can do for you, ask
>> what you can do for our language.
>>
>> Mark Lawrence
>>
>> _______________________________________________
>> Tutor maillist - ***@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Emile van Sebille
2015-08-01 16:18:40 UTC
Permalink
Hi Hal,

Seeing now that the output is only extracted from six address blocks,
can you paste in the full contents of the file mbox-short.txt? (or the
first 5-10 address sets if this is only representative) I think if we
have a better understanding of the structure of the content you're
parsing it'll help us identify what the program will need to be prepared
to handle.

Emile


On 7/31/2015 5:26 PM, Ltc Hotspot wrote:
> Hi Mark,
>
> Desired output on execution of the script:
>
> ***@uct.ac.za
> ***@media.berkeley.edu
> ***@umich.edu
> ***@iupui.edu
> ***@umich.edu
> ***@iupui.edu
>
>
>
> [...]
>
> Regards,
> Hal
>
> On Fri, Jul 31, 2015 at 5:21 PM, Ltc Hotspot <***@gmail.com> wrote:
>
>> Mark:
>> Is this any better, message sent from GMail?
>> Regards,
>> Hal
>>
>> On Fri, Jul 31, 2015 at 5:02 PM, Mark Lawrence <***@yahoo.co.uk>
>> wrote:
>>
>>> On 31/07/2015 19:57, ***@gmail.com wrote:
>>>
>>> I believe that this is the third time that you've been asked to do
>>> something about the amount of whitespace that you're sending to this list.
>>>
>>> --
>>> My fellow Pythonistas, ask not what our language can do for you, ask
>>> what you can do for our language.
>>>
>>> Mark Lawrence
>>>
>>> _______________________________________________
>>> Tutor maillist - ***@python.org
>>> To unsubscribe or change subscription options:
>>> https://mail.python.org/mailman/listinfo/tutor
>>>
>>
>>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Ltc Hotspot
2015-08-01 18:54:04 UTC
Permalink
Hi Emile,


I just noticed there are duplicates

Here is the complete line output as requested, below:

***@uct.ac.za
***@media.berkeley.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@caret.cam.ac.uk
***@gmail.com
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@media.berkeley.edu
***@media.berkeley.edu
***@media.berkeley.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
There were 27 lines in the file with From as the first word


Hal

On Sat, Aug 1, 2015 at 9:18 AM, Emile van Sebille <***@fenx.com> wrote:

> Hi Hal,
>
> Seeing now that the output is only extracted from six address blocks, can
> you paste in the full contents of the file mbox-short.txt? (or the first
> 5-10 address sets if this is only representative) I think if we have a
> better understanding of the structure of the content you're parsing it'll
> help us identify what the program will need to be prepared to handle.
>
> Emile
>
>
>
> On 7/31/2015 5:26 PM, Ltc Hotspot wrote:
>
>> Hi Mark,
>>
>> Desired output on execution of the script:
>>
>> ***@uct.ac.za
>> ***@media.berkeley.edu
>> ***@umich.edu
>> ***@iupui.edu
>> ***@umich.edu
>> ***@iupui.edu
>>
>>
>>
>> [...]
>>
>> Regards,
>> Hal
>>
>> On Fri, Jul 31, 2015 at 5:21 PM, Ltc Hotspot <***@gmail.com>
>> wrote:
>>
>> Mark:
>>> Is this any better, message sent from GMail?
>>> Regards,
>>> Hal
>>>
>>> On Fri, Jul 31, 2015 at 5:02 PM, Mark Lawrence <***@yahoo.co.uk>
>>> wrote:
>>>
>>> On 31/07/2015 19:57, ***@gmail.com wrote:
>>>>
>>>> I believe that this is the third time that you've been asked to do
>>>> something about the amount of whitespace that you're sending to this
>>>> list.
>>>>
>>>> --
>>>> My fellow Pythonistas, ask not what our language can do for you, ask
>>>> what you can do for our language.
>>>>
>>>> Mark Lawrence
>>>>
>>>> _______________________________________________
>>>> Tutor maillist - ***@python.org
>>>> To unsubscribe or change subscription options:
>>>> https://mail.python.org/mailman/listinfo/tutor
>>>>
>>>>
>>>
>>> _______________________________________________
>> Tutor maillist - ***@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Ltc Hotspot
2015-08-01 19:00:08 UTC
Permalink
Hi Everyone:


Let me repost the question:

You will parse the From line using split() and print out the second word in
the line (i.e. the entire address of the person who sent the message). Then
print out a count at the end.

*Hint:* make sure not to include the lines that start with 'From:'.

You can download the sample data at
http://www.pythonlearn.com/code/mbox-short.txt



Regards,

Hal

On Sat, Aug 1, 2015 at 9:18 AM, Emile van Sebille <***@fenx.com> wrote:

> Hi Hal,
>
> Seeing now that the output is only extracted from six address blocks, can
> you paste in the full contents of the file mbox-short.txt? (or the first
> 5-10 address sets if this is only representative) I think if we have a
> better understanding of the structure of the content you're parsing it'll
> help us identify what the program will need to be prepared to handle.
>
> Emile
>
>
>
> On 7/31/2015 5:26 PM, Ltc Hotspot wrote:
>
>> Hi Mark,
>>
>> Desired output on execution of the script:
>>
>> ***@uct.ac.za
>> ***@media.berkeley.edu
>> ***@umich.edu
>> ***@iupui.edu
>> ***@umich.edu
>> ***@iupui.edu
>>
>>
>>
>> [...]
>>
>> Regards,
>> Hal
>>
>> On Fri, Jul 31, 2015 at 5:21 PM, Ltc Hotspot <***@gmail.com>
>> wrote:
>>
>> Mark:
>>> Is this any better, message sent from GMail?
>>> Regards,
>>> Hal
>>>
>>> On Fri, Jul 31, 2015 at 5:02 PM, Mark Lawrence <***@yahoo.co.uk>
>>> wrote:
>>>
>>> On 31/07/2015 19:57, ***@gmail.com wrote:
>>>>
>>>> I believe that this is the third time that you've been asked to do
>>>> something about the amount of whitespace that you're sending to this
>>>> list.
>>>>
>>>> --
>>>> My fellow Pythonistas, ask not what our language can do for you, ask
>>>> what you can do for our language.
>>>>
>>>> Mark Lawrence
>>>>
>>>> _______________________________________________
>>>> Tutor maillist - ***@python.org
>>>> To unsubscribe or change subscription options:
>>>> https://mail.python.org/mailman/listinfo/tutor
>>>>
>>>>
>>>
>>> _______________________________________________
>> Tutor maillist - ***@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Emile van Sebille
2015-08-01 21:18:29 UTC
Permalink
On 8/1/2015 12:00 PM, Ltc Hotspot wrote:
> Hi Everyone:
>
>
> Let me repost the question:
>
> You will parse the From line using split() and print out the second word in
> the line (i.e. the entire address of the person who sent the message). Then
> print out a count at the end.
>
> *Hint:* make sure not to include the lines that start with 'From:'.
>
> You can download the sample data at
> http://www.pythonlearn.com/code/mbox-short.txt

Cool - thanks. That's an mbox file.

Can you explain the apparent dichotomy of the question directing you to
'parse the from line' and the hint? I'm going to guess they mean that
you're not to print that line in the output? Aah, I see -- there're two
different lines that start From -- both with and without a trailing
colon. So then, we can split on 'From ' and recognizing the split eats
the split-on portion

>>> '1234567'.split('4')
['123', '567']

... and leaves an empty entry when splitting on the first characters of
the line

>>> '1234567'.split('1')
['', '234567']

... we get to:

for addr in [ fromline.split()[0]
for fromline in mbox.split('From ')
if fromline ]:
print addr

***@uct.ac.za
***@media.berkeley.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@iupui.edu
***@iupui.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@umich.edu
***@iupui.edu
***@umich.edu
***@caret.cam.ac.uk
***@gmail.com
***@uct.ac.za
***@uct.ac.za
***@uct.ac.za
***@media.berkeley.edu
***@media.berkeley.edu
***@media.berkeley.edu
***@iupui.edu
***@iupui.edu
***@iupui.edu
>>>



Emile

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Ltc Hotspot
2015-08-01 23:21:34 UTC
Permalink
Hi Emile,
Question: What is the source of the line 7 syntax: mbox.split?

Here is a copy of the Traceback message:
NameError
Traceback (most recent call last)
C:\Users\vm\Desktop\apps\docs\Python\8_5_v_26.py in <module>()
5 addresses = set()
6 for addr in [ fromline.split()[0]
----> 7 for fromline in mbox.split('From ')
8 if fromline ]:
9 count = count + 1
NameError: name 'mbox' is not defined


Revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for addr in [ fromline.split()[0]
for fromline in mbox.split('From ')
if fromline ]:
count = count + 1
print addr
print "There were", count, "lines in the file with From as the first word"

Regards,
Hal

On Sat, Aug 1, 2015 at 2:18 PM, Emile van Sebille <***@fenx.com> wrote:

> On 8/1/2015 12:00 PM, Ltc Hotspot wrote:
>
>> Hi Everyone:
>>
>>
>> Let me repost the question:
>>
>> You will parse the From line using split() and print out the second word
>> in
>> the line (i.e. the entire address of the person who sent the message).
>> Then
>> print out a count at the end.
>>
>> *Hint:* make sure not to include the lines that start with 'From:'.
>>
>> You can download the sample data at
>> http://www.pythonlearn.com/code/mbox-short.txt
>>
>
> Cool - thanks. That's an mbox file.
>
> Can you explain the apparent dichotomy of the question directing you to
> 'parse the from line' and the hint? I'm going to guess they mean that
> you're not to print that line in the output? Aah, I see -- there're two
> different lines that start From -- both with and without a trailing colon.
> So then, we can split on 'From ' and recognizing the split eats the
> split-on portion
>
> >>> '1234567'.split('4')
> ['123', '567']
>
> ... and leaves an empty entry when splitting on the first characters of
> the line
>
> >>> '1234567'.split('1')
> ['', '234567']
>
> ... we get to:
>
> for addr in [ fromline.split()[0]
> for fromline in mbox.split('From ')
> if fromline ]:
> print addr
>
> ***@uct.ac.za
> ***@media.berkeley.edu
> ***@umich.edu
> ***@iupui.edu
> ***@umich.edu
> ***@iupui.edu
> ***@iupui.edu
> ***@umich.edu
> ***@umich.edu
> ***@umich.edu
> ***@umich.edu
> ***@iupui.edu
> ***@umich.edu
> ***@caret.cam.ac.uk
> ***@gmail.com
> ***@uct.ac.za
> ***@uct.ac.za
> ***@uct.ac.za
> ***@media.berkeley.edu
> ***@media.berkeley.edu
> ***@media.berkeley.edu
> ***@iupui.edu
> ***@iupui.edu
> ***@iupui.edu
> >>>
>
>
>
> Emile
>
>
> _______________________________________________
> Tutor maillist - ***@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Emile van Sebille
2015-08-02 00:53:15 UTC
Permalink
On 8/1/2015 4:21 PM, Ltc Hotspot wrote:
> Hi Emile,
> Question: What is the source of the line 7 syntax: mbox.split?


I read mbox from the file. eg,

mbox = open("mbox-short.txt",'r').read()

and it looks to me that if you insert the above in front of the for loop
below you'll get further.

Emile


>
> Here is a copy of the Traceback message:
> NameError
> Traceback (most recent call last)
> C:\Users\vm\Desktop\apps\docs\Python\8_5_v_26.py in <module>()
> 5 addresses = set()
> 6 for addr in [ fromline.split()[0]
> ----> 7 for fromline in mbox.split('From ')
> 8 if fromline ]:
> 9 count = count + 1
> NameError: name 'mbox' is not defined
>
>
> Revised code:
> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> addresses = set()
> for addr in [ fromline.split()[0]
> for fromline in mbox.split('From ')
> if fromline ]:
> count = count + 1
> print addr
> print "There were", count, "lines in the file with From as the first word"


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Cameron Simpson
2015-08-01 09:13:58 UTC
Permalink
On 31Jul2015 17:21, Ltc Hotspot <***@gmail.com> wrote:
>Mark:
>Is this any better, message sent from GMail?
>Regards,
>Hal

Looks better to me.

Cheers,
Cameron Simpson <***@zip.com.au>

>On Fri, Jul 31, 2015 at 5:02 PM, Mark Lawrence <***@yahoo.co.uk>
>wrote:
>
>> On 31/07/2015 19:57, ***@gmail.com wrote:
>>
>> I believe that this is the third time that you've been asked to do
>> something about the amount of whitespace that you're sending to this list.
>>
>> --
>> My fellow Pythonistas, ask not what our language can do for you, ask
>> what you can do for our language.
>>
>> Mark Lawrence
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Continue reading on narkive:
Loading...