Another GIS Blog: Removing Illegal Characters and Preventing Unicode Errors

Thursday, July 14, 2011

Removing Illegal Characters and Preventing Unicode Errors

I hate unicode errors like the one below, and I kept getting them intermediately on a table I was writing some scripts against. Here is an example of an error I received:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4:
ordinal not in range(128)

After much internal debate, I decided to just remove the illegal characters when exporting or reading values in that table. First I check to see if the value returned is a unicode type, then I apply my operation.

 userValue = row.getValue(field)

if type(userValue) is unicode:

   val = ''.join([x for x in userValue if ord(x) < 128])

# do something with val #
Now, the illegal characters are gone!

Here is a simpler example using IDLE:



>>> userValue = "abcdéf"

>>> val = ''.join([x for x in userValue if ord(x) < 128])

>>> print val

abcdf

Notice that the function just removed the é value and produced 'abcdf'.

Hope this helps!