Pergunta

Obviously I'm aware already that strftime and strptime doesn't like byte strings as parameters, however i'm in a pickle here because I sort of need to read a file content which has different character encodings saved in it and i need to handle them all, and send the time portion of each line in this text-file to strptime().

A quick fix would be to split the string, making sure the time simply contains numbers and dashes, but is it possible to somehow pass the byte object without trying to figure out the encoding to strptime()?

with open('file.txt', 'rb') as fh:
    for line in fh:
        time.strptime(line, '%Y-%m-%d ...')

This would obviously fail. I thought of doing repr(line) but that causes the string to look like b'2014-01-07 ...', which i could strip..

Foi útil?

Solução

line is a bytestring, because you opened the file in binary mode. You'll need to decode the string; if it is a date string matching the pattern, you can simply use ASCII:

 time.strptime(line.decode('ascii'), '%Y-%m-%d ...')

You can add a 'ignore' argument to ignore anything non-ASCII, but chances are the line won't fit your date format then anyway.

Note that you cannot pass a value that contains more than the parsed format in it; a line with other text on it not explicitly covered by the strptime() pattern will not work, whatever codec you used.

And if your input really varies that widely in codecs, you'll need to catch exceptions one way or another anyway.

Aside from UTF-16 or UTF-32, I would not expect you to encounter any codecs that use different bytes for the arabic numerals. If your input really mixes multi-byte and single-byte codecs in one file, you have a bigger problem on your hand, not in the least because newline handling will be majorly messed up.

Outras dicas

You should decode the data when you're reading the file:

import codecs
with codecs.open('file.txt', encoding='utf8') as fh:
    for line in fh:
        time.strptime(line, '%Y-%m-%d ...')

It's always better to decode your content as soon as possible.

Also check http://docs.python.org/2/library/codecs.html#codecs.open

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top