fast XORing bytes in python 3 [duplicate]

Question 1

When XORing bytes objects with one million elements each, this loop creates roughly one million temporary bytes objects and copies each byte, on average, roughly 500 thousand times from one temporary bytes to the next. Note that the exact same problem exists for strings (in many other languages, too). The string solution is to create a list of string parts and use ''.join at the end to concatenate them efficiently. You can do the same thing with bytes:

def bxor(b1, b2): # use xor for bytes
    parts = []
    for b1, b2 in zip(b1, b2):
        parts.append(bytes([b1 ^ b2]))
    return b''.join(parts)

Alternatively, you can use a bytearray which is mutable and can therefore avoid the problem. It also allows you to not allocate a new bytes object on every iteration, you can just append the byte/int.

def bxor(b1, b2): # use xor for bytes
    result = bytearray()
    for b1, b2 in zip(b1, b2):
        result.append(b1 ^ b2)
    return result

You can alternatively return bytes(result) if you want/need a bytes object.

Question 2

Adding this in another answer, 'cause it is one:

If you want something faster than the "manual" methods given, there's always Numpy:

import numpy

def bxor_numpy(b1, b2):
    n_b1 = numpy.fromstring(b1, dtype='uint8')
    n_b2 = numpy.fromstring(b2, dtype='uint8')

    return (n_b1 ^ n_b2).tostring()

and it's fast:

first_random = urandom(100000)
second_random = urandom(100000)

min(Timer(partial(bxor_inplace, first_random, second_random)).repeat(10, 100))
#>>> 1.5381054869794752
min(Timer(partial(bxor_append, first_random, second_random)).repeat(10, 100))
#>>> 1.5624085619929247
min(Timer(partial(bxor_numpy, first_random, second_random)).repeat(10, 100))
#>>> 0.009930026979418471

So it's 150x faster than the best alternatives posted here.

Question 3

Using a bytearray is a lot faster already:

def bxor(b1, b2):
    result = bytearray(b1)
    for i, b in enumerate(b2):
        result[i] ^= b
    return bytes(result)

A quick timeit comparison:

>>> import timeit
>>> b1, b2 = b'abcdefg' * 10, b'aaaaaaa' * 10
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor as it', number=10000)
0.9230150280000089
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_ba as it', number=10000)
0.16270576599890774

This avoids creating new bytes objects for all the concatenations.

The b''.join() method proposed by delnan is no much better than the original version:

>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_join as it', number=10000)
0.9936718749995634

And a re-run with bytestrings 100 times larger:

>>> b1, b2 = b'abcdefg' * 1000, b'aaaaaaa' * 1000
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor as it', number=1000)
11.032563796999966
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_join as it', number=1000)
9.242204494001271
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_ba as it', number=1000)
1.762020197998936

to show that bytes.join() is a faster than repeated concatenation.

A final 7 million byte run, repeated 10 times, just with the bytearray version, I ran out of patience with the other versions:

>>> b1, b2 = b'abcdefg' * 1000000, b'aaaaaaa' * 1000000
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_ba as it', number=10)
16.18445999799951

Question 4

Martijn Pieters' timings are a bit different to mine:

def bxor_add(b1, b2): # use xor for bytes
    result = b""
    for b1, b2 in zip(b1, b2):
        result += bytes([b1 ^ b2])
    return result

def bxor_inplace(b1, b2):
    result = bytearray(b1)
    for i, b in enumerate(b2):
        result[i] ^= b
    return bytes(result)

def bxor_join(b1, b2): # use xor for bytes
    parts = []
    for b1, b2 in zip(b1, b2):
        parts.append(bytes([b1 ^ b2]))
    return b''.join(parts)

def bxor_append(b1, b2): # use xor for bytes
    result = bytearray()
    for b1, b2 in zip(b1, b2):
        result.append(b1 ^ b2)
    return bytes(result)

#>>> 

from os import urandom
from timeit import Timer
from functools import partial

first_random = urandom(200000)
second_random = urandom(200000)

Timer(partial(bxor_add, first_random, second_random)).timeit(1)
#>>> 1.3261873809969984
Timer(partial(bxor_inplace, first_random, second_random)).timeit(1)
#>>> 0.03055390200461261
Timer(partial(bxor_join, first_random, second_random)).timeit(1)
#>>> 0.15852201101370156
Timer(partial(bxor_append, first_random, second_random)).timeit(1)
#>>> 0.030534288001945242

first_random = urandom(10000000)
second_random = urandom(10000000)

Timer(partial(bxor_inplace, first_random, second_random)).timeit(1)
#>>> 1.5432947289955337
Timer(partial(bxor_join, first_random, second_random)).timeit(1)
#>>> 7.90503858300508
Timer(partial(bxor_append, first_random, second_random)).timeit(1)
#>>> 1.5145326450001448

I'd go with the append version for clarity and speed.

For clarification, I don't think the append method is meaningfully faster than the inplace version; I just think it's a tiny bit more straightforward.

Nevertheless, because it was requested:

first_random = urandom(100000)
second_random = urandom(100000)

min(Timer(partial(bxor_inplace, first_random, second_random)).repeat(10, 100))
#>>> 1.5381054869794752
min(Timer(partial(bxor_append, first_random, second_random)).repeat(10, 100))
#>>> 1.5196998479950707