Question

I'm writing a program that needs to do a lot of string formatting and I have noticed that .format() is taking a small but significant amount of cpu time. Here's how I'm using it:

str = 'vn {0:.15f} {1:.15f} {2:.15f}\n'.format(normal_array[i].x, normal_array[i].y, normal_array[i].z)

Does anyone know if there is even a slightly faster way to do this as a small fraction X 100000 can add up

Was it helpful?

Solution

Try to replace .format with % expression and pre-calculate normal_array:

item = normal_array[i]
'vn %.15f %.15f %.15f\n' % (item.x, item.y, item.z)

Also replacing indexes with iteration over values can slightly improve speed:

for item in normal_array:
    'vn %.15f %.15f %.15f\n' % (item.x, item.y, item.z)

Benchmark:

def gen_data(n):
    l = []
    for k in xrange(n):
        l.append(collections.namedtuple('normal', ('x', 'y', 'z'))(random.random(), random.random(), random.random()))
    return l

if __name__ == '__main__':
    times = 1000
    print 'format:'
    print timeit.Timer('for i in xrange(len(normal_array)):\n    str = "vn {0:.15f} {1:.15f} {2:.15f}\\n".format(normal_array[i].x, normal_array[i].y, normal_array[i].z)\n',
            'from __main__ import gen_data; normal_array = gen_data(1000)').timeit(times)
    print '%s:'
    print timeit.Timer('for i in xrange(len(normal_array)):\n    str = "vn %.15f %.15f %.15f\\n".format(normal_array[i].x, normal_array[i].y, normal_array[i].z)\n',
            'from __main__ import gen_data; normal_array = gen_data(1000)').timeit(times)
    print '%s+iteration:'
    print timeit.Timer('for o in normal_array:\n    str = "vn %.15f %.15f %.15f\\n".format(o.x, o.y, o.z)\n',
            'from __main__ import gen_data; normal_array = gen_data(1000)').timeit(times)

Results (lower is better)

format:
5.34718108177
%s:
1.30601406097
%s+iteration:
1.23484301567

OTHER TIPS

Also you can try to migrate to PyPy, there was an article about string formatting comparison in cpython and PyPy.

Try this (old school) approach by replacing .format() with % format directives:

str = 'vn %.15f %.15f %.15f\n' % (normal_array[i].x, normal_array[i].y, normal_array[i].z )          

Seems using % will be faster:

timeit str='%.15f %.15f %.15f\n' % (a, b, c)
100000 loops, best of 3: 4.99 us per loop

timeit str2='{:.15f} {:.15f} {:.15f}\n'.format(a, b, c)
100000 loops, best of 3: 5.97 us per loop

Python v 2.7.2 under XP SP2, variables a, b, and c are floats.

If the float conversion is still a bottleneck, you might try to farm the formatting out to a multiprocessing.Pool, and use multiprocessing.map_async or multiprocessing.imap to print the resulting string. This will use all the cores on your machine to do the formatting. Although it could be that the overhead from passing the data to and from the different processes masks the improvents from parallelizing the formatting.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top