training file language_id.txt for Google Prediction API unusable

https://stackoverflow.com//questions/25068341

26-12-2019
|

Question

I'm following the Hello Prediction example of the Google Prediction API.

Unfortunately the training file language_id.txt seems to be corrupted somehow? I tested downloading it using Google Chrome and Firefox, same result, see screenshot:

enter image description here

I think, therefore my tests do not work and I always get back English 1.0 as score for the Muy Bueno example string.

  ...
  {
   "label": "English",
   "score": "1.000000"
  },
  ...

Where do I get a usable language_id.txt test file from or is there anything else I can do?

EDIT: My guess is, the file has not been stored in UTF-8 format on the Google server?

Solution

The file is in UTF-8, but it doesn't declare an encoding, so viewing it in a browser assumes the default HTTP charset, ISO-8859-1.

I'm not sure why you're actually getting a corrupted copy (if I view it in Chrome, it appears corrupt, but saving it results in a correct UTF-8-encoded file), but perhaps you could try another mechanism to download it?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow