Structuring a program. Classes and functions in Python

https://stackoverflow.com/questions/1561282

21-09-2019
|

Question

I'm writing a program that uses genetic techniques to evolve equations. I want to be able to submit the function 'mainfunc' to the Parallel Python 'submit' function. The function 'mainfunc' calls two or three methods defined in the Utility class. They instantiate other classes and call various methods. I think what I want is all of it in one NAMESPACE. So I've instantiated some (maybe it should be all) of the classes inside the function 'mainfunc'. I call the Utility method 'generate()'. If we were to follow it's chain of execution it would involve all of the classes and methods in the code.

Now, the equations are stored in a tree. Each time a tree is generated, mutated or cross bred, the nodes need to be given a new key so they can be accessed from a dictionary attribute of the tree. The class 'KeySeq' generates these keys.

In Parallel Python, I'm going to send multiple instances of 'mainfunc' to the 'submit' function of PP. Each has to be able to access 'KeySeq'. It would be nice if they all accessed the same instance of KeySeq so that none of the nodes on the returned trees had the same key, but I could get around that if necessary.

So: my question is about stuffing EVERYTHING into mainfunc. Thanks (Edit) If I don't include everything in mainfunc, I have to try to tell PP about dependent functions, etc by passing various arguements in various places. I'm trying to avoid that.

(late Edit) if ks.next() is called inside the 'generate() function, it returns the error 'NameError: global name 'ks' is not defined'

class KeySeq:
    "Iterator to produce sequential \
    integers for keys in dict"
    def __init__(self, data = 0):
        self.data = data
    def __iter__(self):
        return self
    def next(self):
        self.data = self.data + 1
        return self.data
class One:
    'some code'
class Two:
    'some code'
class Three:
    'some code'
class Utilities:
    def generate(x):
        '___________'
    def obfiscate(y):
        '___________'
    def ruminate(z):
        '__________'


def mainfunc(z):
    ks = KeySeq()
    one = One()
    two = Two()
    three = Three()
    utilities = Utilities()
    list_of_interest = utilities.generate(5)
    return list_of_interest

result = mainfunc(params)

Solution

If you want all of the instances of mainfunc to use the same KeySeq object, you can use the default parameter value trick:

def mainfunc(ks=KeySeq()):
   key = ks.next()

As long as you don't actually pass in a value of ks, all calls to mainfunc will use the instance of KeySeq that was created when the function was defined.

Here's why, in case you don't know: A function is an object. It has attributes. One of its attributes is named func_defaults; it's a tuple containing the default values of all of the arguments in its signature that have defaults. When you call a function and don't provide a value for an argument that has a default, the function retrieves the value from func_defaults. So when you call mainfunc without providing a value for ks, it gets the KeySeq() instance out of the func_defaults tuple. Which, for that instance of mainfunc, is always the same KeySeq instance.

Now, you say that you're going to send "multiple instances of mainfunc to the submit function of PP." Do you really mean multiple instances? If so, the mechanism I'm describing won't work.

But it's tricky to create multiple instances of a function (and the code you've posted doesn't). For example, this function does return a new instance of g every time it's called:

>>> def f():
        def g(x=[]):
            return x
        return g
>>> g1 = f()
>>> g2 = f()
>>> g1().append('a')
>>> g2().append('b')
>>> g1()
['a']
>>> g2()
['b']

If I call g() with no argument, it returns the default value (initially an empty list) from its func_defaults tuple. Since g1 and g2 are different instances of the g function, their default value for the x argument is also a different instance, which the above demonstrates.

If you'd like to make this more explicit than using a tricky side-effect of default values, here's another way to do it:

def mainfunc(): if not hasattr(mainfunc, "ks"): setattr(mainfunc, "ks", KeySeq()) key = mainfunc.ks.next()

Finally, a super important point that the code you've posted overlooks: If you're going to be doing parallel processing on shared data, the code that touches that data needs to implement locking. Look at the callback.py example in the Parallel Python documentation and see how locking is used in the Sum class, and why.

OTHER TIPS

It's fine to structure your program that way. A lot of command line utilities follow the same pattern:

#imports, utilities, other functions

def main(arg):
    #...

if __name__ == '__main__':
    import sys
    main(sys.argv[1])

That way you can call the main function from another module by importing it, or you can run it from the command line.

Your concept of classes in Python is not sound I think. Perhaps, it would be a good idea to review the basics. This link will help.

Python Basics - Classes

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow