Speed up cython loop by copying from a list to a numpy array

https://stackoverflow.com/questions/17908763

04-06-2022
|

سؤال

I am writing some performance intensive code, and was hoping to get some feedback from the cythonistas out there on how to improve it further. The purpose of the functions I've written is a bit tough to explain, but what they do isn't all that intimidating. The first (roughly) takes two dictionaries of lists of numbers and joins them to get one dictionary of lists of numbers. It's only run once so I am less concerned with optimizing it. The second first calls the first, then uses its result to basically cross indices stored in a numpy array with the numbers in the lists of arrays to form queries (new numbers) on a (pybloomfiltermmap) bloom filter.

I've determined the heavy step is due to my nested loops and reduced the number of loops used, moved out of the loops everything that only needs to happen once, and typed everything to the best of my knowledge. Still, each iteration of i in the second function takes about 10 seconds, which is too much. The main things I still see as yellow in the html compilation output are due to indexed accesses in the lists and numpy array, so I tried to replace my lists with all numpy arrays but wasn't able to get any improvement. I would greatly appreciate any feedback you could provide.

#cython: boundscheck=False
#cython: wraparound=False

import numpy as np
cimport numpy as np

def merge_dicts_of_lists(dict c1, dict c2):
    cdef dict res
    cdef int n, length1, length2, length3
    cdef unsigned int i, j, j_line, jj, k, kk, new_line

    res =  {n: [] for n in range(256)}
    length1 = len(c1)

    for i in range(length1):
        length2 = len(c1[i])
        for j in range(length2):
            j_line = c1[i][j]
            jj = (j_line) % 256
            length3 = len(c2[jj]) 
            for k in range(length3):
                kk = c2[jj][k]
                new_line = (j_line << 10) + kk
    res[i].append(new_line)
    return res


def get_4kmer_set(np.ndarray c1, dict c2, dict c3, bf):
    cdef unsigned int num = 0
    cdef unsigned long long query = 0
    cdef unsigned int i, j, i_row, i_col, j_line
    cdef unsigned int length1, length2
    cdef dict merge 
    cdef list m_i 

    merge = merge_dicts_of_lists(c2, c3)
    length1 = len(c1[:,0])
    for i in range(length1):
        print "i is %d" % i
        i_row = c1[i,0]
        i_col = c1[i,1]
        m_i = merge[i_col]
        length2 = len(m_i)
        for j in range(length2):
            j_line = m_i[j]
            query = (i_row << 24) + (i_col << 20) + j_line
            if query in bf:
                num += 1
    print "%d yes answers from bf" % num

المحلول

For posterity's sake, I'm adding an off-topic answer, but I hope it will be useful for someone nonetheless. The code I posted above isn't much different from what I've decided to stay with, as it was already compiling to short C lines as seen by the Ctyhon html compilation output.

Since the innermost operation was a Bloom filter query, I found what helped most was speeding up that step in two ways. One was changing the hash function used by pybloomfiltermmap to an available C++ implementation of murmurhash3. I found pybloomfilter was using sha, which was comparatively slowas expected for a cryptographic hash function. The second boost came from applying the trick found in this paper: http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf. Basically, it says you can save a lot of computation by using a linear combination of two hash values instead of k different hashes for the BF. These two tricks together gave an order of magnitude (~5x) improvement in query time.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow