Domanda

I'm trying to parse an API response however the response data contains characters that are giving python some trouble.

API Response: electricity price | 19.52¢/kW·h (January 1, 2014) natural gas price | $11.05 per thousand cubic feet (January 15, 2014) heating oil price | $4.338/gal (March 17, 2014) propane price | $3.968/gal (March 17, 2014)

The error is raised at the "cents per kilowatt hour" characters.

Full Error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa2' in position 25: ordinal not in range(128)

Response as it appears in terminal: electricity price | 19.52\xa2/kW\xb7h (January 1, 2014)\nnatural gas price | $11.05 per thousand cubic feet (January 15, 2014)\nheating oil price | $4.338/gal (March 17, 2014)\npropane price | $3.968/gal (March 17, 2014)

How would I go about parsing the data around these problem characters? I don't need the full text, just the numerical values within it. Thanks for your help.

EDIT:

The code causing the error:

search('electricity price | {:d}', energy)

I also tried:

search('electricity price | {:f}', energy)

Which had a similar result. energy is a variable storing the full string listed above.

EDIT 2:

Full code including API call:

client = wolframalpha.Client('apikey')
energy_query = 'utilities prices in ' + city + ' ' + state_abbr
res = client.query(energy_query)


energy = (next(res.results).text)

search('electricity price | {:d}', energy)

Full traceback:

File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-    packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site- packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/nulife.py", line 120, in index
search('electricity price | {:d}', energy)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-    packages/parse.py", line 1041, in search
return Parser(format, extra_types=extra_types).search(string, pos, endpos)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-    packages/parse.py", line 678, in search
return self._generate_result(m)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-    packages/parse.py", line 699, in _generate_result
fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/parse.py", line 375, in f
if string[0] == '-':
TypeError: 'NoneType' object has no attribute '__getitem__'
È stato utile?

Soluzione

energy is already a Unicode object; trying to call .decode() on it triggers an implict encode first (using ASCII, the default codec):

>>> energy = u'19.52¢/kW·h'
>>> energy.decode('windows-1252')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/encodings/cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa2' in position 5: ordinal not in range(128)

Notice the exception; decoding a Unicode string triggered a UnicodeEncodeError.

This is as designed, the wolframalpha library uses ElementTree to parse the XML response, which always gives you Unicode objects.

After your update, I took a look over the parse library source code I fear you found a bug in the code; they don't escape regular expression metacharacters in the literal strings you hand in. If you escape the | character it works:

>>> search('electricity price \\| {:f}', u'electricity price | 19.52¢/kW·h')
<Result (19.52,) {}>

I've opened a bug report with the parse project about this.

Do note that the library is probably limited to parsing ASCII text only; don't try to match the ¢/kW·h as word characters, at least.

Update: parse version 1.6.4 has been released fixing this specific bug.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top