Domanda

I am having a problem with passing a uri with unicode characters to rdflib for adding to a Graph()

So for instance, I want to run:

from rdflib import Graph
g = Graph()
uri = 'http://dbpedia.org/resource/René_Auberjonois'
g.parse(uri)

But I get an ascii codec encoding error that is so common in Python.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 17: ordinal not in range(128)

If the uri were passed as `http://dbpedia.org/resource/Ren%C3%A9_Auberjonois', it would be fine, the parser works if the string were in that fashion (sorry, I don't know what to call things, whether it's 'raw', 'escaped', 'unescaped' or otherwise).

is there any thing I can do to uri (it is not being set in this manner, it's being set through a function looping over a list of names) so that print uri would give http://dbpedia.org/resource/Ren%C3%A9_Auberjonois?

background: I originally asked this question but adding u in front of a string is either not feasible with how the script is set up (i am not setting each string like: s = 'René_Auberjonois') or doesn't actually work in the end when I pass it to rdflib (i.e. still getting the encoding error because it is being passed http://dbpedia.org/resource/René_Auberjonois)

Also if there are good resources for understanding the problem I am having here, that would be cool. I am confused by character encoding at the moment.

È stato utile?

Soluzione

If the percent encoded form is what you need, then you could use urllib:

>>> import urllib
>>> s='http://dbpedia.org/René_Auberjonois'
>>> urllib.quote(s)
'http%3A//dbpedia.org/Ren%C3%A9_Auberjonois'
>>> urllib.quote(s, safe=':')
'http:%2F%2Fdbpedia.org%2FRen%C3%A9_Auberjonois'
>>> urllib.quote(s, safe=':/')
'http://dbpedia.org/Ren%C3%A9_Auberjonois'
>>> 

Use the safe parameter to specify characters that shouldn't be quoted. It defaults to /.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top