Python UUID represented as special characters

https://stackoverflow.com/questions/2278239

21-09-2019
|

Question

When creating a UUID in Python, likeso:

>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

How could one map that UUID into a string made up of the capitalized alphabet A-Z minus the characters D, F, I, O, Q, and U, plus the numerical digits, plus the characters "+" and "=". i.e. the from an integer or string onto the set of 32 (relatively OCR friendly) characters:

[ABCEGHJKLMNPRSTVWXYZ1234567890+=]

I'll call this the OCRf set (for OCR friendly).

I'd like to have an isomorphic function:

def uuid_to_ocr_friendly_chars(uid)
    """takes uid, an integer, and transposes it into a string made 
       of the the OCRf set
    """
    ...

My first thought is to go through the process of changing the uuid to base 32. e.g.

OCRf = "ABCEGHJKLMNPRSTVWXYZ1234567890+="

def uuid_to_ocr_friendly_chars(uid):
     ocfstr = ''
     while uid > 1:
        ocfstr += OCRf[uid % 32]
        uid /= 32
     return ocfstr

However, I'd like to know if this method is the best and fastest way to go about this conversion - or if there's a simpler and faster method (e.g. a builtin, a smarter algorithm, or just a better method).

I'm grateful for your input. Thank you.

Solution

How important is it to you to "squeeze" the representation by 18.75%, i.e., from 32 to 26 characters? Because, if saving this small percentage of bytes isn't absolutely crucial, something like uid.hex.upper().replace('D','Z') will do what you ask (not using the whole alphabet you make available, but the only cost of this is missing that 18.75% "squeezing").

If squeezing down every last byte is crucial, I'd work on substrings of 20 bits each -- that's 5 hex characters, 4 characters in your funky alphabet. There are 6 of those (plus 8 bits left over, for which you can take the hex.upper().replace as above since there's nothing to gain in doing anything fancier). You can easily get the substrings by slicing .hex and turn each into an int with an int(theslice, 16). Then, you can basically apply the same algorithm you're using above -- but the arithmetic is all done on much-smaller numbers, so the speed gain should be material. Also, don't build the string by looping on += -- make a list of all the "digits", and ''.join them all at the end -- that's also a performance improvement.

OTHER TIPS

>>> OCRf = 'ABCEGHJKLMNPRSTVWXYZ1234567890+='
>>> uuid = 'a8098c1a-f86e-11da-bd1a-00112444be1e'
>>> binstr = bin(int(uuid.replace("-",""),16))[2:].zfill(130)
>>> ocfstr = "".join(OCRf[int(binstr[i:i+5],2)] for i in range(0,130,5))
>>> ocfstr
'HLBJJB2+ETCKSP7JWACGYGMVW+'

To convert back again

>>> "%x"%(int("".join(bin(OCRf.index(i))[2:].zfill(5) for i in ocfstr),2))
'a8098c1af86e11dabd1a00112444be1e'

transtbl = string.maketrans(
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567',
  'ABCEGHJKLMNPRSTVWXYZ1234567890+='
)

uuidstr = uuid.uuid1()

print base64.b32encode(str(uuidstr).replace('-', '').decode('hex')).rstrip('=').translate(transtbl)

Yes, this method does make me a bit ill, thanks for asking.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow