Question

Okay, so I've got a piece of Python code which really needs optimizing.

  • It's a Game-of-Life iteration over a small (80x60-pixel) image and extracts the RGB values from it.
  • currently using nested for-loops; I'd rather swap out those for loops for the faster map() c function, but if I do that I can't figure out how I can get the x,y values, nor the local values defined out of the scope of the functions I'd need to define.
  • would using map() be any faster than this current set of for loops? How could I use it and still get x,y?
  • I currently use pygame Surfaces, and I've tried the surfarray/pixelarray modules, but since I'm changing/getting every pixel, it's a lot slower than Surface.get_at()/set_at().
  • Also, slightly irrelevant... do you think this could be made quicker if Python wasn't traversing a list of numbers but just incrementing a number, like in other languages? Why doesn't python include a normal for() as well as their foreach()?
  • The amount of conditionals there probably makes things slower too, right? The slowest part is checking for neighbours (where it builds the list n)... I replaced that whole bit with slice access on a 2D array but it doesn't work properly.

Redacted version of code:

xr = xrange(80)
yr = xrange(60)
# surface is an instance of pygame.Surface
get_at = surface.get_at()
set_at = surface.set_at()

for x in xr:
    # ....
    for y in yr:
        # ...
        pixelR = get_at((x,y))[0]
        pixelG = get_at((x,y))[1]
        pixelB = get_at((x,y))[2]
        # ... more complex stuff here which changes R,G,B values independently of each other
        set_at((x,y),(pixelR,pixelG,pixelB))

Full version of the function:

# xr, yr = xrange(80), xrange(60)
def live(surface,xr,yr):
    randint = random.randint
    set_at = surface.set_at
    get_at = surface.get_at
    perfect = perfectNeighbours #
    minN = minNeighbours        # All global variables that're defined in a config file.
    maxN = maxNeighbours        #
    pos = actual                # actual = (80,60)
    n = []
    append = n.append
    NEIGHBOURS = 0

    for y in yr: # going height-first for aesthetic reasons.
        decay = randint(1,maxDecay)
        growth = randint(1,maxGrowth)

        for x in xr:
            r, g, b, a = get_at((x,y))

            del n[:]
            NEIGHBOURS = 0

            if x>0 and y>0 and x<pos[0]-1 and y<pos[1]-1:
                append(get_at((x-1,y-1))[1])
                append(get_at((x+1,y-1))[1])
                append(get_at((x,y-1))[1])
                append(get_at((x-1,y))[1])
                append(get_at((x+1,y))[1])
                append(get_at((x-1,y+1))[1])
                append(get_at((x+1,y+1))[1])
                append(get_at((x,y+1))[1])
                for a in n:
                    if a > 63:
                        NEIGHBOURS += 1

            if NEIGHBOURS == 0 and (r,g,b) == (0,0,0): pass
            else:

                if NEIGHBOURS < minN or NEIGHBOURS > maxN:
                    g = 0
                    b = 0
                elif NEIGHBOURS==perfect:
                    g += growth
                    if g > 255:
                        g = 255
                        b += growth
                        if b > growth: b = growth
                else:
                    if g > 10: r = g-10
                    if g > 200: b = g-100
                    if r > growth: g = r
                    g -= decay
                    if g < 0:
                        g = 0
                        b = 0
                r -= 1
                if r < 0:
                    r = 0
                set_at((x,y),(r,g,b))
Was it helpful?

Solution

Since you are reading and rewriting every pixel, I think you can get the best speed improvement by not using a Surface.

I suggest first taking your 80x60 image and converting it to a plain bitmap file with 32-bit pixels. Then read the pixel data into a python array object. Now you can walk over the array object, reading values, calculating new values, and poking the new values into place with maximum speed. When done, save your new bitmap image, and then convert it to a Surface.

You could also use 24-bit pixels, but that should be slower. 32-bit pixels means one pixel is one 32-bit integer value, which makes the array of pixels much easier to index. 24-bit packed pixels means each pixel is 3 bytes, which is much more annoying to index into.

I believe you will gain much more speed out of this approach than by trying to avoid the use of for. If you try this, please post something here to let us know how well it worked or didn't. Good luck.

EDIT: I thought that an array has only a single index. I'm not sure how you managed to get two indexes to work. I was expecting you to do something like this:

def __i(x, y):
    assert(0 <= x < 80)
    assert(0 <= y < 60)
    i = (y*80 + x) * 4
    return i
def red(x, y):
    return __a[__i(x, y)]
def green(x, y):
    return __a[__i(x, y) + 1]
def blue(x, y):
    return __a[__i(x, y) + 2]
def rgb(x, y):
    i = __i(x, y)
    return __a[i], __a[i + 1], __a[i + 2]
def set_rgb(x, y, r, g, b):
    i = __i(x, y)
    _a[i] = r
    _a[i + 1] = g
    _a[i + 2] = b

# example:
r, g, b = rgb(23, 33)

Since a Python array can only hold a single type, you will want to set the type to "unsigned byte" and then index like I showed.

Where of course __a is the actual array variable.

If none of this is helpful, try converting your bitmap into a list, or perhaps three lists. You can use nested lists to get 2D addressing.

I hope this helps. If it is not helpful, then I am not understanding what you are doing; if you explain more I'll try to improve the answer.

OTHER TIPS

What's making your code slow is probably not the loops, they are incredibly fast.

What slows done your code are the number of function calls. For example

pixelR = get_at((x,y))[0]
pixelG = get_at((x,y))[1]
pixelB = get_at((x,y))[2]

is a lot slower than (about 3 times I guess)

r, g, b, a = get_at((x,y))

Every get_at, set_at call locks the surface, therefore it's faster to directly access the pixels using the available methods. The one that seems most reasonable is Surface.get_buffer.

Using map doesn't work in your example, because you need the indexes. With as few as 80 and 60 numbers it might even be faster to use range() instead of xrange().

map(do_stuff, ((x, y) for x in xrange(80) for y in xrange(60)))

where do_stuff would presumably be defined like so:

def do_stuff(coords):
    r, g, b, a = get_at(coords)
    # ... whatever you need to do with those ...
    set_at(coords, (r, g, b))

You could alternatively use a list comprehension instead of a generator expression as the second argument to map (replace ((x, y) ...) with [(x, y) ...]) and use range instead of xrange. I'd say that it's not very likely to have a significant effect on performance, though.

Edit: Note that gs is certainly right about the for loops not being the main thing in need of optimisation in your code... Cutting down on superfluous calls to get_at is more important. In fact, I'm not sure if replacing the loops with map will actually improve performance here at all... Having said that, I find the map version more readable (perhaps because of my FP background...), so here you go anyway. ;-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top