JSON strings can't contain literal newlines in them e.g.,
not_a_json_string = '"\n"' # in Python source
json.loads(not_a_json_string) # raises ValueError
but they can contain escaped newlines:
json_string = r'"\n"' # raw-string literal (== '"\\n"')
s = json.loads(json_string)
i.e., the original text (json_string
) has no newlines in it (it has the backslash followed by n
character -- two characters) but the parsed result does contain the newline: '\n' in s
.
That is why the example:
for line in file:
d = json.loads(line)
print(d['key'])
may print more lines than the file
contains.
It is unrelated to utf-8.
In general, there could also be an issue with non-native newlines e.g., b'\r\r\n\n'
, or an issue with Unicode newlines such as u'"\u2028
"'
(U+2028 LINE SEPARATOR).