dimanche 1 mars 2015

Is there any way to force ipython to interpret utf-8 symbols?

I'm using ipython notebook.


What I want to do is search a literal string for any spanish accented letters (ñ,á,é,í,ó,ú,Ñ,Á,É,Í,Ó,Ú) and change them to their closest representation in the english alphabet.


I decided to write down a simple function and give it a go:



def remove_accent(n):
listn = list(n)
for i in range(len(listn)):
if listn[i] == 'ó':
listn[i] =o
return listn


Seemed simple right simply compare if the accented character is there and change it to its closest representation so i went ahead and tested it getting the following output:



in []: remove_accent('whatever !@# ó')
out[]: ['w',
'h',
'a',
't',
'e',
'v',
'e',
'r',
' ',
'!',
'@',
'#',
' ',
'\xc3',
'\xb3']


I've tried to change the default encoding from ASCII (I presume since i'm getting two positions for te accented character instead of one '\xc3','\xb3') to UTF-8 but this didnt work. what i would like to get is:



in []: remove_accent('whatever !@# ó')
out[]: ['w',
'h',
'a',
't',
'e',
'v',
'e',
'r',
' ',
'!',
'@',
'#',
' ',
'o']


PD: this wouldn't be so bad if the accented character yielded just one position instead of two I would just require to change the if condition but I haven't find a way to do that either.


Aucun commentaire:

Enregistrer un commentaire