Defining text encoding in a file containing JSON

https://stackoverflow.com/questions/23298065

09-07-2023
|

Question

My application stores configuration data (including strings for the UI) in a text file containing JSON. For example, config.json might contain the following:

{
   "CustomerName" : "Omni Consumer Products",
   "SubmitButtonText": "Click here to submit",
   // etc etc etc..
}

This file goes to our translation vendor, who makes duplicates of it in multiple supported languages. They might be building their own app, or they might be editing it in a text editor. I don't know.

Since we're going to be using all manner of non-ASCII characters in some of our languages, I'd like to ensure everybody is clear on what character encoding we're using.

So if this were an XML file, I would stick the following declaration at the top of the file:

<?xml version="1.0" encoding="UTF-8"?>

Any reasonable text editor or XML parser will see this and know that the file is encoded in UTF-8.

Is there any similar standard I can put at the top of a JSON file, and be reasonably assured that consumers will play nicely with it?

Solution

JSON's default encoding is UTF-8:

http://www.ietf.org/rfc/rfc4627.txt

From section 3:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

This determination is unambiguous so there is no special place where an encoding is described in the format itself.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow