How do I use Python to pack a big integer treating four characters as an unsigned long in network byte order as Ruby's .pack(“N”) does?
Question
This is the bit of Ruby I want to implement in Python:
Base64.urlsafe_encode64([Digest::MD5.hexdigest(url).to_i(16)].pack("N")).sub(/==\n?$/, '')
You see, this helps turn a URL like this:
http://stackoverflow.com/questions/ask
Into a small code like thise:
sUEBtw
The big integer that gets generated in the process is this:
307275247029202263937236026733300351415
I've been able to pack this into binary form using this Python code:
url = 'http://stackoverflow.com/questions/ask'
n = int(hashlib.md5(url).hexdigest(), 16)
s = struct.Struct('d')
values = [n]
packed_data = s.pack(*values)
short_code = base64.urlsafe_b64encode(packed_data)[:-1]
print short_code
The short code I get is this:
zgMM62Hl7Ec
As you can see it's larger than the one I get with Ruby this the packing is using a different format.
You're help will be appreciated.
La solution
This does the trick:
import hashlib
import base64
url = 'http://stackoverflow.com/questions/ask'
print base64.urlsafe_b64encode(hashlib.md5(url).digest()[-4:])[:-2]
Output
sUEBtw
.digest()
gives the packed bytes of the full 16-byte digest so no need for struct.pack
, but it seems Ruby's .pack('N')
only converts the last four bytes of the digest.
Autres conseils
Ruby pack('N')
converts to a network-order (big-endian) 32bit unsigned. python struct('d')
converts to an IEEE double precision float. I think you want struct('>I')
for the equivalent big endian 32 bit unsigned in python.
So it is clear now that Ruby's pack('N')
takes only the lower 4 bytes so following DSM's suggestion I got this code to work:
import hashlib
import base64
url = 'https://stackoverflow.com/questions/ask'
n = int(hashlib.md5(url).hexdigest(), 16)
s = struct.Struct('>I')
values = [n % (2**32)]
packed_data = s.pack(*values)
print base64.urlsafe_b64encode(packed_data)[:-2]
Nonetheless, as explained in Mark Tolonen's answer, hashlib
's HASH object's digest()
method you get the hash already packed so taking the last four bytes with [-4:]
for encoding using Base64
's urlsafe_b64encode
is good enough.