Question

I use the following code to load 24-bit binary data into a 16-bit numpy array :

temp = numpy.zeros((len(data) / 3, 4), dtype='b')
temp[:, 1:] = numpy.frombuffer(data, dtype='b').reshape(-1, 3)
temp2 = temp.view('<i4').flatten() >> 16       # >> 16 because I need to divide by 2**16 to load my data into 16-bit array, needed for my (audio) application
output = temp2.astype('int16')

I imagine that it's possible to improve the speed efficiency, but how?

Was it helpful?

Solution

It seems like you are being very roundabout here. Won't this do the same thing?

output = np.frombuffer(data,'b').reshape(-1,3)[:,1:].flatten().view('i2')

This would save some time from not zero-filling a temporary array, skipping the bitshift and avoiding some unneceessary data moves. I haven't actually benchmarked it yet, though, and I expect the savings to be modest.

Edit: I have now performed the benchmark. For len(data) of 12 million, I get 80 ms for your version and 39 ms for mine, so pretty much exactly a factor 2 speedup. Not a very big improvement, as expected, but then your starting point was already pretty fast.

Edit2: I should mention that I have assumed little endian here. However, the original question's code is also implicitly assuming little endian, so this is not a new assumption on my part.

(For big endian (data and architecture), you would replace 1: by :-1. If the data had a different endianness than the CPU, then you would also need to reverse the order of the bytes (::-1).)

Edit3: For even more speed, I think you will have to go outside python. This fortran function, which also uses openMP, gets me a factor 2+ speedup compared to my version (so 4+ times faster than yours).

subroutine f(a,b)
        implicit none
        integer*1, intent(in)  :: a(:)
        integer*1, intent(out) :: b(size(a)*2/3)
        integer :: i
        !$omp parallel do
        do i = 1, size(a)/3
                b(2*(i-1)+1) = a(3*(i-1)+2)
                b(2*(i-1)+2) = a(3*(i-1)+3)
        end do
        !$omp end parallel do
end subroutine

Compile with FOPT="-fopenmp" f2py -c -m basj{,.f90} -lgomp. You can then import and use it in python:

import basj
def convert(data): return def mine2(data): return basj.f(np.frombuffer(data,'b')).view('i2')

You can control the number of cores to use via the environment variavble OMP_NUM_THREADS, but it defaults to using all available cores.

OTHER TIPS

Inspired by @amaurea's answer, here is a cython version (I already used cython in my original code, so I'll continue with cython instead of mixing cython + fortran) :

import cython
import numpy as np
cimport numpy as np

def binary24_to_int16(char *data):
    cdef int i
    res = np.zeros(len(data)/3, np.int16)
    b = <char *>((<np.ndarray>res).data)
    for i in range(len(data)/3):
        b[2*i] = data[3*i+1]
        b[2*i+1] = data[3*i+2]
    return res            

There is a factor 4 speed gain :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top