python opens text file with a space between every character

https://stackoverflow.com/questions/603115

03-07-2019
|

Question

Whenever I try to open a .csv file with the python command fread = open('input.csv', 'r') it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?

Thanks.

Update

Ok, I got it with the help of Jarret Hardie's post

this is the code that I used to convert the file to ascii

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks!

Solution

The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.

Try something like:

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')

The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.

EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.

The code snippet above seems to work on my machine with that file.

OTHER TIPS

The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.

Isn't csv a simple txt file with values separated with comma. Just try to open it with a text editor to see if the file is correctly formed.

To read an encoded file, you can simply replace open with codecs.open.

fread = codecs.open('input.csv', 'r', 'utf-16')

It did never ocurred to me, but as truppo said, it must be something wrong with the file.

Try to open the file in Excel/BrOffice Calc and Save As the file as Csv again.

If the problem persists, try a subset of the data: fist 10/last 10/intermediate 10 lines of the file.

Ok, I got it with the help of Jarret Hardie's post

this is the code that I used to convert the file to ascii

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks!

Open the file in binary mode, 'rb'. Check it in a HEX Editor and check for null padding '00'. Open the file in something like Scintilla Text Editor to check the characters present in the file.

Here's the quick and easy way, esp if python won't parse the input correctly

sed 's/ \(.\)/\1/g'

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow