Discussion:
[Tutor] urllib.urlencode and unicode strings
Jon Crump
2007-05-17 22:06:11 UTC
Permalink
Dear all,

I've got a python list of data pulled via ElementTree from an xml file
<?xml version="1.0" encoding="utf-8"?> that contains mixed str and unicode
strings, like this:

[u'Jumi\xe9ge, Normandie', 'Farringdon, Hampshire', 'Ravensworth,
Durham', 'La Suse, Anjou', 'Lions, Normandie', 'Lincoln, Lincolnshire',
'Chelmsford, Essex', u'Ch\xe2telerault, Poitou', 'Bellencombre,
Normandie'] etc.

trying to use geopy to geocode these placenames I get the following
traceback:

Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "build/bdist.macosx-10.3-fat/egg/geopy/geocoders.py", line 327, in
geocode
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
line 1242, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in
position 2: ordinal not in range(128)

It appears that urlencode is choking on the unicode literals. Can anybody
tell me how I can translate these strings into something like this:
Ch%C3%A2tellerault.

No doubt this is obvious, but for a hopeless tyro like me, it is proving
to be un-intuitive.

Thanks
Kent Johnson
2007-05-17 23:29:54 UTC
Permalink
Post by Jon Crump
Dear all,
I've got a python list of data pulled via ElementTree from an xml file
<?xml version="1.0" encoding="utf-8"?> that contains mixed str and unicode
[u'Jumi\xe9ge, Normandie', 'Farringdon, Hampshire', 'Ravensworth,
Durham', 'La Suse, Anjou', 'Lions, Normandie', 'Lincoln, Lincolnshire',
'Chelmsford, Essex', u'Ch\xe2telerault, Poitou', 'Bellencombre,
Normandie'] etc.
trying to use geopy to geocode these placenames I get the following
File "<stdin>", line 2, in <module>
File "build/bdist.macosx-10.3-fat/egg/geopy/geocoders.py", line 327, in
geocode
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
line 1242, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in
position 2: ordinal not in range(128)
It appears that urlencode is choking on the unicode literals. Can anybody
Ch%C3%A2tellerault.
c = u'\xe2'
c
u'\xe2'
Post by Jon Crump
c.encode('utf-8')
'\xc3\xa2'
Post by Jon Crump
import urllib
urllib.quote(c.encode('utf-8'))
'%C3%A2'

Kent
Jon Crump
2007-05-18 19:13:43 UTC
Permalink
Kent,

Thanks so much. It's easy when you know how. Now that I know, I only need
the encode('utf-8') step since geopy does the urlencode step.
Post by Jon Crump
c = u'\xe2'
c
u'\xe2'
c.encode('utf-8')
'\xc3\xa2'
import urllib
urllib.quote(c.encode('utf-8'))
'%C3%A2'
Kent
Loading...