Pergunta

I need to use a big data structure, more specifically, a big dictionary to do the looking up job.

At the very first my code is like this:

#build the dictionary
blablabla
#look up some information in the ditionary
blablabla

As I need to look up many times, I begin to realize that it is a good idea to implement it as a function, say lookup(info).

Then here comes the problem, how should I deal with the big dictionary?

Should I use lookup(info, dictionary) to pass it as an argument, or should I just initialize the dictionary in main() and just use it as an global variable?

The first one seems more elegant because I think maintaining global variable is troublesome. But on the other hand, I'm not sure of the efficiency of passing a big dictionary to a function. It will be called many times and it will certainly be a nightmare if the argument passing is inefficient.

Thanks.

Edit1:

I just made an experiment of the above two ways:

Here's the snippet of the codes. lookup1 implements the argument passing looking up while lookup2 use global data structure "big_dict".

class CityDict():
    def __init__():
        self.code_dict = get_code_dict()
    def get_city(city):
        try:
            return self.code_dict[city]
        except Exception:
            return None         

def get_code_dict():
    # initiate code dictionary from file
    return code_dict

def lookup1(city, city_code_dict):
    try:
        return city_code_dict[city]
    except Exception:
        return None

def lookup2(city):
    try:
        return big_dict[city]
    except Exception:
        return None


t = time.time()
d = get_code_dict()
for i in range(0, 1000000):
    lookup1(random.randint(0, 10000), d)

print "lookup1 is %f" % (time.time() - t)


t = time.time()
big_dict = get_code_dict()
for i in range(0, 1000000):
    lookup2(random.randint(0, 1000))
print "lookup2 is %f" % (time.time() - t)


t = time.time()
cd = CityDict() 
for i in range(0, 1000000):
    cd.get_city(str(i))
print "class is %f" % (time.time() - t)

This is the output:

lookup1 is 8.410885
lookup2 is 8.157661
class is 4.525721

So it seems that the two ways are almost the same, and yes, the global variable method is a little bit more efficient.

Edit2:

Added the class version suggested by Amber, and then test the efficiency again. Then we could see from tthe results that Amber is right, we should use the class version.

Foi útil?

Solução

Neither. Use a class, which is specifically designed for grouping functions (methods) with data (members):

class BigDictLookup(object):
    def __init__(self):
        self.bigdict = build_big_dict() # or some other means of generating it
    def lookup(self):
        # do something with self.bigdict

def main():
    my_bigdict = BigDictLookup()
    # ...
    my_bigdict.lookup()
    # ...
    my_bigdict.lookup()

Outras dicas

Answering the core question, parameter passing is not inefficient, it's not like your values will get copied around. Python passed references around, which is not to say that the way parameters are passed fits the well-known schemes of "pass-by-value" or "pass-by-reference".

It's best imagined as initializing the value of a variable local to the called function with a reference value provided by the caller, which are passed by value.

Still, the suggestion to use a class is probably a good idea.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top