[Tutor] urllib.urlencode and unicode strings

Discussion:

Jon Crump

2007-05-17 22:06:11 UTC

Dear all,

I've got a python list of data pulled via ElementTree from an xml file
<?xml version="1.0" encoding="utf-8"?> that contains mixed str and unicode
strings, like this:

[u'Jumi\xe9ge, Normandie', 'Farringdon, Hampshire', 'Ravensworth,
Durham', 'La Suse, Anjou', 'Lions, Normandie', 'Lincoln, Lincolnshire',
'Chelmsford, Essex', u'Ch\xe2telerault, Poitou', 'Bellencombre,
Normandie'] etc.

trying to use geopy to geocode these placenames I get the following
traceback:

Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "build/bdist.macosx-10.3-fat/egg/geopy/geocoders.py", line 327, in
geocode
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
line 1242, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in
position 2: ordinal not in range(128)

It appears that urlencode is choking on the unicode literals. Can anybody
tell me how I can translate these strings into something like this:
Ch%C3%A2tellerault.

No doubt this is obvious, but for a hopeless tyro like me, it is proving
to be un-intuitive.

Thanks

Kent Johnson

2007-05-17 23:29:54 UTC

Permalink

Post by Jon Crump
Dear all,
I've got a python list of data pulled via ElementTree from an xml file
<?xml version="1.0" encoding="utf-8"?> that contains mixed str and unicode
[u'Jumi\xe9ge, Normandie', 'Farringdon, Hampshire', 'Ravensworth,
Durham', 'La Suse, Anjou', 'Lions, Normandie', 'Lincoln, Lincolnshire',
'Chelmsford, Essex', u'Ch\xe2telerault, Poitou', 'Bellencombre,
Normandie'] etc.
trying to use geopy to geocode these placenames I get the following
File "<stdin>", line 2, in <module>
File "build/bdist.macosx-10.3-fat/egg/geopy/geocoders.py", line 327, in
geocode
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
line 1242, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in
position 2: ordinal not in range(128)
It appears that urlencode is choking on the unicode literals. Can anybody
Ch%C3%A2tellerault.

c = u'\xe2'
c

u'\xe2'

Post by Jon Crump

c.encode('utf-8')

'\xc3\xa2'

Post by Jon Crump

import urllib
urllib.quote(c.encode('utf-8'))

'%C3%A2'

Kent

Jon Crump

2007-05-18 19:13:43 UTC

Permalink

Kent,

Thanks so much. It's easy when you know how. Now that I know, I only need
the encode('utf-8') step since geopy does the urlencode step.

Post by Jon Crump

c = u'\xe2'
c

u'\xe2'

c.encode('utf-8')

'\xc3\xa2'

import urllib
urllib.quote(c.encode('utf-8'))

'%C3%A2'
Kent