Вопрос

I'm trying to decode a gzip garmin activity file using Python. According to Garmin the file is a base64 gz file. I'm uploading the file from the browser via post and receiving the data in a Django App.

The beginning of the file looks like this.

begin-base64 644 data.xml.gz\nH4sIAAAAAAAAA y9a4 lx3Hn d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu

I've used the following code to adjust for padding and decode base64:

import base64
padding_factor = (4 - len(data) % 4) % 4
data += "="*padding_factor
data_decoded = base64.b64decode(unicode(data).translate(dict(zip(map(ord, u'-_'), u'+/'))))

The beginning of data_decoded looks like this on the screen:

\xe8"\x9f\xe6\xda\xb1\xee\xb8\xeb\x8e\x1dj\xd6\xb1\x9aX3\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03/Z\xe2\w\x1ewz~\x0b\x81\xec\x9c\xcd\xb8fd6(\r}\xda\xc0\x19\xb3\x00a\xa5\xde5\xf6\xcf\xa2U\xe9\x95\x88\x91H\x81n\xcb\xf7\xb4\x9f\xcc\xa7y%\xbd\x95\x9e\x13\xcd\x10\xf9Th\x04\x8d\xdf\xdf\xa6\xba\xa9\xcd\xf9=s\xf8G\xfc

print data_decoded looks like this:

}???a??5?ϢU镈?H?n????̧y%?????Th??ߦ????=s?G?

I then try to unzip the file using the following:

from cStringIO import StringIO
from gzip import GzipFile
sio = StringIO(data_decoded)
gzf = gzip.GzipFile(fileobj=sio)
guff = gzf.read()

After which I get the following error:

  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 245, in read
    self._read(readsize)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 287, in _read
    self._read_gzip_header()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 181, in _read_gzip_header
    raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file

I also tried saving the file directly to disk and running gunzip from the command line and that also results in the same error.

Any help would be much appreciated.

Это было полезно?

Решение

It looks like you're decoding the entire thing, including the begin-base64 644 data.xml.gz part, so you're getting a bunch of garbage at the start:

b1 = '''begin-base64 644 data.xml.gz\nH4sIAAAAAAAAA y9a4 lx3Hn
d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu'''

b2 = '''\nH4sIAAAAAAAAA y9a4 lx3Hn
d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu'''

If you run your algorithm on b2, you get something starting with this:

m\xe8"\x9d\xb6\xac{\xae

(I don't know how you lost the m in copying and pasting, but either way, it's not valid.)

If you run it on b2, you get something starting with this:

\x1f\x8b\x08\x00\x00\x00

That's what you want.

Of course taking off the '\n' has the same effect, since base64 ignores whitespace. So most likely, it's being used as a delimiter. If that's actually a '\\n' (aka r'\n') rathern than a '\n', you have to remove it to get the right answer.

Also, you seem to be doing a lot of extra work for no good reason. Most likely the data is actually correctly padded, but that part may be worthwhile. But the whole translate(dict(zip(map(ord, u'-_'), u'+/'))) does the same thing as passing an altchars argument to b64decode, but less efficiently and harder to read (if it's correct). (By the way, if you were doing translate as an optimization against the cost of calling replace twice, the conversion to and from Unicode is almost certain to overwhelm the savings. Even if you had profiled and determined that it made a difference, you'd probably want to generate the translate map above—both for efficiency, so you don't do it once per string, and, more importantly, for readability.)

Putting it together:

data = '''begin-base64 644 data.xml.gz\nH4sIAAAAAAAAA y9a4 lx3Hn
d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu'''
_, data = data.split('\n', 1)
padding_factor = (4 - len(data) % 4) % 4
data += "="*padding_factor
data_decoded = base64.b64decode(data, '-_')

Again, if you've got a '\\n' rather than a '\n', change the split line accordingly.

Другие советы

You need to strip off the beginning of the file since it is not part of the base64 data. If you know that the \n will be part of every file you can use it as a delimiter:

index = data.find('\\n')
if index > 0:
    data = data[index+2:]
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top