This is an interesting question. Let's focus on the first method you proposed, as that should be a totally fine way to approach this problem. When I print out the lines one by one, here is what I get:
>>> core_value
'\\311 is a fancy kind of E'
What happened for me is that the character was read as a literal '\', which must be escaped to be printed as such. If we change the escaped character (\\
) to a non-escaped character (\
), we get the following:
>>> cv = core_value.replace('\\311','\311')
'\xc9 is a fancy kind of E'
>>> print cv
É is a fancy kind of E
The weird piece here is that you don't know when in the original file \311
is "supposed to be" one character or four. If you know for a fact that those will all be one character, you can write some vile code based on this answer:
Python Unicode, have unicode number in normal string, want to print unicode
To transorm all of the things that come after a \
into the correct unicode characters and delete the \
.