Question

Fellows,

I am unable to parse a unicode text file submitted using django forms. Here are the quick steps I performed:

  1. Uploaded a text file ( encoding: utf-16 ) ( File contents: Hello World 13 )

  2. On server side, received the file using filename = request.FILES['file_field']

  3. Going line by line: for line in filename: yield line

  4. type(filename) gives me <class 'django.core.files.uploadedfile.InMemoryUploadedFile'>

  5. type(line) is <type 'str'>

  6. print line : '\xff\xfeH\x00e\x00l\x00l\x00o\x00 \x00W\x00o\x00r\x00l\x00d\x00 \x001\x003\x00'

  7. codecs.BOM_UTF16_LE == line[:2] returns True

  8. Now, I want to re-construct the unicode or ascii string back like "Hello World 13" so that I can parse the integer from line.

One of the ugliest way of doing this is to retrieve using line[-5:] (= '\x001\x003\x00') and thus construct using line[-5:][1], line[-5:][3].

I am sure there must be better way of doing this. Please help.

Thanks in advance!

Was it helpful?

Solution

Use codecs.iterdecode() to decode the object on the fly:

from codecs import iterdecode

for line in iterdecode(filename, 'utf16'): yield line
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top