Discussion:
[Tutor] Parsing/Crawling test College Class Site.
bruce
2015-06-01 23:06:53 UTC
Permalink
Hi. I'm creating a test py app to do a quick crawl of a couple of
pages of a psoft class schedule site. Before I start asking
questions/pasting/posting code... I wanted to know if this is the kind
of thing that can/should be here..

The real issues I'm facing aren't so much pythonic as much as probably
dealing with getting the cookies/post attributes correct. There's
ongoing jscript on the site, but I'm hopeful/confident :) that if the
cookies/post is correct, then the target page can be fetched..

If this isn't the right list, let me know! And if it is, I'll start posting..

Thanks

-bd
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-06-01 23:48:26 UTC
Permalink
Post by bruce
Hi. I'm creating a test py app to do a quick crawl of a couple of
pages of a psoft class schedule site. Before I start asking
questions/pasting/posting code... I wanted to know if this is the kind
of thing that can/should be here..
Probably. we are targeted at beginners to Python and focus
on core language and standard library. If you are using
the standard library modules to build your app then certainly.,

If you are using a third party module then we may/may not
be able to help depending on who, if anyone, within the
group is familiar with it. In that case you may be better
on the <whichever toolset you are using> forum.
Post by bruce
The real issues I'm facing aren't so much pythonic as much as probably
dealing with getting the cookies/post attributes correct. There's
ongoing jscript on the site, but I'm hopeful/confident :) that if the
cookies/post is correct, then the target page can be fetched..
Post sample code, any errors you get and as specific a
description of the issue as you can.
Include OS and Python versions.
Use plain text not HTML to preserve code formatting.


If it turns out to be way off topic we'll tell you (politely)
where you should go for help.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Mark Lawrence
2015-06-02 05:42:37 UTC
Permalink
Post by bruce
Hi. I'm creating a test py app to do a quick crawl of a couple of
pages of a psoft class schedule site. Before I start asking
questions/pasting/posting code... I wanted to know if this is the kind
of thing that can/should be here..
The real issues I'm facing aren't so much pythonic as much as probably
dealing with getting the cookies/post attributes correct. There's
ongoing jscript on the site, but I'm hopeful/confident :) that if the
cookies/post is correct, then the target page can be fetched..
If this isn't the right list, let me know! And if it is, I'll start posting..
Thanks
-bd
You'll almost certainly need the main list at
https://mail.python.org/mailman/listinfo/python-list alternatively
available as gmane.comp.python.general

However just to get you going take a look at these.

https://pypi.python.org/pypi/requests
https://pypi.python.org/pypi/beautifulsoup4
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-06-02 07:27:28 UTC
Permalink
Forwarding to list.
Always use ReplyAll (or reply List if you have that option) to include
the list.


-------- Forwarded Message --------
Subject: Re: [Tutor] Parsing/Crawling test College Class Site.
Date: Mon, 1 Jun 2015 20:42:48 -0400
From: bruce <***@gmail.com>
To: Alan Gauld <***@btinternet.com>



Seriously embarrassed!!

The issue that's happening is the process doesn't generate the page
with the classlist!!

forgot to mention why I was posting this...
Hi Alan.
Thanks. So, here goes!
https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL
The following is a sample of the test code, as well as the url/posts
of the pages as produced by the Firefox/Firebug process.
Basically, a user accesses the initial url, and then selects a couple
of items on the page, followed by the "Search" btn at the bottom of
the page.
-subject (insert ACC) for accounting
-uncheck "Show Open Classes Only"
-select the "Additional Search Criteria" expansion (bottom of the page)
--In the "Days of Week" dropdown, select the "include any of these days"
--select all days except Sat/Sun
finally, select the "Search" btn, which generates the actual class
list for the ACC dept.
During each action, the app might generate ajax which
updates/interfaces with the backend. All of this can be seen/tracked
(I think) if you have the Firebug plugin for firefox running, where
you can then track the cookies/post actions. The same data can be
generated running LiveHttpHeaders (or some other network app).
The process is running on centos, using V2.6.6.
The test app is a mix of standard py, and liberal use of the system
curl cmd. In order to generate one of the post vars, XPath is used to
extract the value from the initial generated file/content.
#!/usr/bin/python
#-------------------------------------------------------------
#
# unlvClassTest.py
#
# jun/1/15
#
#
#
# test generating of the psoft dept data
#
# cmdline unlvClassTest.py
#
#
#
#
#
#
#
#-------------------------------------------------------------
#test python script
import subprocess
import re
import libxml2dom
import urllib
import urllib2
import sys, string
import time
import os
import os.path
from hashlib import sha1
from libxml2dom import Node
from libxml2dom import NodeList
import hashlib
import pycurl
import StringIO
import uuid
import simplejson
from string import ascii_uppercase
#=======================================
execfile('/apps/parseapp2/ascii_strip.py')
execfile('dir_defs_inc.py')
appDir="/apps/parseapp2/"
# data output filename
datafile="unlvDept.dat"
# global var for the parent/child list json
plist={}
cname="unlv.lwp"
#----------------------------------------
# main app
#
# get the input struct, parse it, determine the level
#
cmd="echo '' > "+datafile
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res=proc.communicate()[0].strip()
cmd="echo '' > "+cname
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res=proc.communicate()[0].strip()
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-L "http://www.lonestar.edu/class-search.htm"'
#proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
#res=proc.communicate()[0].strip()
#print res
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-L "https://campus.lonestar.edu/classsearch.htm"'
#proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
#res1=proc.communicate()[0].strip()
#print res1
#initial page
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-L "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL"'
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res2=proc.communicate()[0].strip()
#print cmd+"\n\n"
print res2
sys.exit()
# s contains HTML not XML text
d = libxml2dom.parseString(res2, html=1)
#-----------Form------------
sel_ = d.xpath(selpath)
#--print svpath
#--print "llllll"
#--print " select error"
sys.exit()
val=""
ndx=0
val=a.textContent.strip()
print val
#sys.exit()
sys.exit()
#build the 1st post
ddd=1
post=""
post="ICAJAX=1"
post=post+"&ICAPPCLSDATA="
post=post+"&ICAction=DERIVED_CLSRCH_SSR_EXPAND_COLLAPS%24149%24%241"
post=post+"&ICActionPrompt=false"
post=post+"&ICAddCount="
post=post+"&ICAutoSave=0"
post=post+"&ICBcDomData=undefined"
post=post+"&ICChanged=-1"
post=post+"&ICElementNum=0"
post=post+"&ICFind="
post=post+"&ICFocus="
post=post+"&ICNAVTYPEDROPDOWN=0"
post=post+"&ICResubmit=0"
post=post+"&ICSID="+urllib.quote(val)
post=post+"&ICSaveWarningFilter=0"
post=post+"&ICStateNum="+str(ddd)
post=post+"&ICType=Panel"
post=post+"&ICXPos=0"
post=post+"&ICYPos=114"
post=post+"&ResponsetoDiffFrame=-1"
post=post+"&SSR_CLSRCH_WRK_SSR_OPEN_ONLY$chk$3=N"
post=post+"&SSR_CLSRCH_WRK_SUBJECT$0=ACC"
post=post+"&TargetFrameName=None"
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-e "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?&"
'
cmd=cmd+'-d "'+post+'" '
cmd=cmd+'-L "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL"'
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res3=proc.communicate()[0].strip()
print cmd+"\n"
print res3
##2nd post
ddd=ddd+1
post=""
post="ICAJAX=1"
post=post+"&ICAPPCLSDATA="
post=post+"&ICNAVTYPEDROPDOWN=0"
post=post+"&ICType=Panel"
post=post+"&ICElementNum=0"
post=post+"&ICStateNum="+str(ddd)
post=post+"&ICAction=SSR_CLSRCH_WRK_SUBJECT%240"
post=post+"&ICXPos=0"
post=post+"&ICYPos=501"
post=post+"&ResponsetoDiffFrame=-1"
post=post+"&TargetFrameName=None"
post=post+"&FacetPath=None"
post=post+"&ICSaveWarningFilter=0"
post=post+"&ICChanged=-1"
post=post+"&ICAutoSave=0"
post=post+"&ICResubmit=0"
post=post+"&ICSID="+urllib.quote(val)
post=post+"&ICActionPrompt=false"
post=post+"&ICBcDomData=undefined"
post=post+"&ICFind="
post=post+"&ICAddCount="
post=post+"&ICFocus=SSR_CLSRCH_WRK_INCLUDE_CLASS_DAYS%246"
post=post+"&SSR_CLSRCH_WRK_SUBJECT$0=ACC"
post=post+"&SSR_CLSRCH_WRK_SSR_OPEN_ONLY$chk$3=N"
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-e "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?&"
'
cmd=cmd+'-d "'+post+'" '
cmd=cmd+'-L "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL"'
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res3=proc.communicate()[0].strip()
print cmd+"\n"
print res3+"\n\n\n\n\n"
print post
##sys.exit()
##3rd post
ddd=ddd+1
post=""
post="ICAJAX=1"
post=post+"&ICNAVTYPEDROPDOWN=0"
post=post+"&ICType=Panel"
post=post+"&ICElementNum=0"
post=post+"&ICStateNum="+str(ddd)
post=post+"&ICAction=CLASS_SRCH_WRK2_SSR_PB_CLASS_SRCH"
post=post+"&ICXPos=0"
post=post+"&ICYPos=501"
post=post+"&ResponsetoDiffFrame=-1"
post=post+"&TargetFrameName=None"
post=post+"&FacetPath=None"
post=post+"&ICFocus="
post=post+"&ICSaveWarningFilter=0"
post=post+"&ICChanged=-1"
post=post+"&ICAutoSave=0"
post=post+"&ICResubmit=0"
post=post+"&ICSID="+urllib.quote(val)
post=post+"&ICActionPrompt=false"
post=post+"&ICBcDomData=undefined"
post=post+"&ICFind="
post=post+"&ICAddCount="
post=post+"&ICAPPCLSDATA="
post=post+"&SSR_CLSRCH_WRK_INCLUDE_CLASS_DAYS$6=J"
post=post+"&SSR_CLSRCH_WRK_MON$chk$6=Y"
post=post+"&SSR_CLSRCH_WRK_MON$6=Y"
post=post+"&SSR_CLSRCH_WRK_TUES$chk$6=Y"
post=post+"&SSR_CLSRCH_WRK_TUES$6=Y"
post=post+"&SSR_CLSRCH_WRK_WED$chk$6=Y"
post=post+"&SSR_CLSRCH_WRK_WED$6=Y"
post=post+"&SSR_CLSRCH_WRK_THURS$chk$6=Y"
post=post+"&SSR_CLSRCH_WRK_THURS$6=Y"
post=post+"&SSR_CLSRCH_WRK_FRI$chk$6=Y"
post=post+"&SSR_CLSRCH_WRK_FRI$6=Y"
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-e "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?&"
'
cmd=cmd+'-d "'+post+'" '
cmd=cmd+'-L "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL"'
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res3=proc.communicate()[0].strip()
print cmd+"\n"
print res3+"\n\n\n\n\n"
print post
sys.exit()
-------------------------------------------------------------------------------------
-The initianl url
https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL
(performs a get)
--Select the "Additional Search Criteria"
---generates the backend ajax, -- seen by the post action
------https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL
post [ICAJAX=1&ICNAVTYPEDROPDOWN=0&ICType=Panel&ICElementNum=0&ICStateNum=1&ICAction=DERIVED_CLSRCH_SSR_EXPAND_COLLAPS%24149%24%241&ICXPos=0&ICYPos=191&ResponsetoDiffFrame=-1&TargetFrameName=None&FacetPath=None&ICFocus=&ICSaveWarningFilter=0&ICChanged=-1&ICAutoSave=0&ICResubmit=0&ICSID=NwBLGklapJeRFylfen15jatQIwoGcJoQa%2BaO5AyhcwU%3D&ICActionPrompt=false&ICBcDomData=undefined&ICFind=&ICAddCount=&ICAPPCLSDATA=&SSR_CLSRCH_WRK_SSR_OPEN_ONLY$chk$3=N]
--selecting ACC as the dept
--post url --- https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL
post[ICAJAX=1&ICNAVTYPEDROPDOWN=0&ICType=Panel&ICElementNum=0&ICStateNum=2&ICAction=SSR_CLSRCH_WRK_SUBJECT%240&ICXPos=0&ICYPos=362&ResponsetoDiffFrame=-1&TargetFrameName=None&FacetPath=None&ICFocus=SSR_CLSRCH_WRK_INCLUDE_CLASS_DAYS%246&ICSaveWarningFilter=0&ICChanged=-1&ICAutoSave=0&ICResubmit=0&ICSID=NwBLGklapJeRFylfen15jatQIwoGcJoQa%2BaO5AyhcwU%3D&ICActionPrompt=false&ICBcDomData=undefined&ICFind=&ICAddCount=&ICAPPCLSDATA=&SSR_CLSRCH_WRK_SUBJECT$0=ACC]
-selecting the "SearchBTN"
--post URL
https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL
post [ICAJAX=1&ICNAVTYPEDROPDOWN=0&ICType=Panel&ICElementNum=0&ICStateNum=3&ICAction=CLASS_SRCH_WRK2_SSR_PB_CLASS_SRCH&ICXPos=0&ICYPos=633&ResponsetoDiffFrame=-1&TargetFrameName=None&FacetPath=None&ICFocus=&ICSaveWarningFilter=0&ICChanged=-1&ICAutoSave=0&ICResubmit=0&ICSID=NwBLGklapJeRFylfen15jatQIwoGcJoQa%2BaO5AyhcwU%3D&ICActionPrompt=false&ICBcDomData=undefined&ICFind=&ICAddCount=&ICAPPCLSDATA=&SSR_CLSRCH_WRK_INCLUDE_CLASS_DAYS$6=J&SSR_CLSRCH_WRK_MON$chk$6=Y&SSR_CLSRCH_WRK_MON$6=Y&SSR_CLSRCH_WRK_TUES$chk$6=Y&SSR_CLSRCH_WRK_TUES$6=Y&SSR_CLSRCH_WRK_WED$chk$6=Y&SSR_CLSRCH_WRK_WED$6=Y&SSR_CLSRCH_WRK_THURS$chk$6=Y&SSR_CLSRCH_WRK_THURS$6=Y&SSR_CLSRCH_WRK_FRI$chk$6=Y&SSR_CLSRCH_WRK_FRI$6=Y]
Post by Alan Gauld
Post by bruce
Hi. I'm creating a test py app to do a quick crawl of a couple of
pages of a psoft class schedule site. Before I start asking
questions/pasting/posting code... I wanted to know if this is the kind
of thing that can/should be here..
Probably. we are targeted at beginners to Python and focus
on core language and standard library. If you are using
the standard library modules to build your app then certainly.,
If you are using a third party module then we may/may not
be able to help depending on who, if anyone, within the
group is familiar with it. In that case you may be better
on the <whichever toolset you are using> forum.
Post by bruce
The real issues I'm facing aren't so much pythonic as much as probably
dealing with getting the cookies/post attributes correct. There's
ongoing jscript on the site, but I'm hopeful/confident :) that if the
cookies/post is correct, then the target page can be fetched..
Post sample code, any errors you get and as specific a
description of the issue as you can.
Include OS and Python versions.
Use plain text not HTML to preserve code formatting.
If it turns out to be way off topic we'll tell you (politely)
where you should go for help.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Alan Gauld
2015-06-02 10:07:46 UTC
Permalink
The following is a sample of the test code, as well as the url/posts
of the pages as produced by the Firefox/Firebug process.
I'm not really answering your question but addressing some
issues in your code...
execfile('/apps/parseapp2/ascii_strip.py')
execfile('dir_defs_inc.py')
I'm not sure what these do but usually its better to
import the files as modules then execute their
functions directly.
appDir="/apps/parseapp2/"
# data output filename
datafile="unlvDept.dat"
# global var for the parent/child list json
plist={}
cname="unlv.lwp"
#----------------------------------------
# main app
It makes testing (and reuse) easier if you put the main code
in a function called main() and then just call that here.

Also your code could be broken up into smaller functions
which again will make testing and debugging easier.
#
# get the input struct, parse it, determine the level
#
cmd="echo '' > "+datafile
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res=proc.communicate()[0].strip()
Its easier and more efficient/reliable to create the
file directly from Python. Calling the subprocess modyule
each time starts up extra processes.

Also you store the result but never use it...
cmd="echo '' > "+cname
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res=proc.communicate()[0].strip()
See above
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-L "http://www.lonestar.edu/class-search.htm"'
You build up strings like this many times but its very inefficient.
There are several better options:
1) create a list of substrings then use join() to convert
the list to a string.
2) use a triple quoted string to create the string once only.

And since you are mostly passing them to Popen look at the
docs to see how to pass a list of args instead of one large
string, its more secure and generally better practice.
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-L "https://campus.lonestar.edu/classsearch.htm"'
#initial page
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-L
"https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL"'
proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE)
res2=proc.communicate()[0].strip()
print res2
sys.exit()
Since this is non conditional you always exit here so nothing
else ever gets executed. This may be the cause of your problem?
# s contains HTML not XML text
d = libxml2dom.parseString(res2, html=1)
#-----------Form------------
sel_ = d.xpath(selpath)
sys.exit()
val=""
ndx=0
val=a.textContent.strip()
print val
#sys.exit()
sys.exit()
#build the 1st post
ddd=1
post=""
This does nothing since you immediately replace it with the next line.
post="ICAJAX=1"
post=post+"&ICAPPCLSDATA="
post=post+"&ICAction=DERIVED_CLSRCH_SSR_EXPAND_COLLAPS%24149%24%241"
post=post+"&ICActionPrompt=false"
post=post+"&ICAddCount="
post=post+"&ICAutoSave=0"
post=post+"&ICBcDomData=undefined"
post=post+"&ICChanged=-1"
post=post+"&ICElementNum=0"
post=post+"&ICFind="
post=post+"&ICFocus="
post=post+"&ICNAVTYPEDROPDOWN=0"
post=post+"&ICResubmit=0"
post=post+"&ICSID="+urllib.quote(val)
post=post+"&ICSaveWarningFilter=0"
post=post+"&ICStateNum="+str(ddd)
post=post+"&ICType=Panel"
post=post+"&ICXPos=0"
post=post+"&ICYPos=114"
post=post+"&ResponsetoDiffFrame=-1"
post=post+"&SSR_CLSRCH_WRK_SSR_OPEN_ONLY$chk$3=N"
post=post+"&SSR_CLSRCH_WRK_SUBJECT$0=ACC"
post=post+"&TargetFrameName=None"
Since these are all hard coded strings you might as well
have just hard coded the final string and saved a lot
of processing. (and code space)
cmd='curl -vvv '
cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)
Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"'
cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+'-e
"https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?&"
This looks awfully similar to the code up above. Could you have reused
the command? Maybe with some parameters - check out string formatting
operations. eg: 'This string takes %s as a parameter" % 'a string'

I'll stop here, its all getting a bit repetitive.
Which is, in itself a sign that you need to create some functions.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Loading...