Question

Strange question this, I know.

I have a code base in fortran 77 that for the most part parses large non-binary files, does some manipulation to these files and then does a lot of file writing. The code base does not do any matrix manipulation or number crunching. This legacy code is in fortran because a lot of other code bases do require serious number crunching. This was originally just written in fortran because there was knowledge of fortran.

My proposal is to re-write this entirely in python (most likely 3.3). Maintenance of the fortran code is just as difficult as you would expect, and the tests are as poor as you can imagine. Obviously python would help a lot here.

Is there any performance hits (or even gains) in terms of the file handling speed in python? Currently the majority of run time of this system is in reading/writing the files.

Thanks in advance

Was it helpful?

Solution

The IO parts of the Python standard library are implemented as efficient C code, so I've seen performance that is better than in e.g. Java, especially in cases where the program is IO bound (as opposed to CPU bound).

Re:

Currently the majority of run time of this system is in reading/writing the files.

Furthermore, if your logic processes the file as a stream, not the contents of the file as a whole, you might actually see a performance improvement when migrating to Python if you use the right tools for the job. Basically the idea is to read the input in chunks, process the chunk and write the result into the output file immediately. This minimizes memory usage and latency, especially if your pipeline consists of multiple steps. Python generators allow writing such logic in a very clean, readable and concise manner, which is something you'll not find in Fortran or C, at least without some major extra effort to build such abstraction (and even then you'd end up with very magic and/or cryptic code).

See http://www.dabeaz.com/generators/ for a really good text about file processing in Python using generators.

In addition, depending on the nature and complexity of your processing algorithms, you might find that other abstractions (such as coroutines) or libraries (gevent, numpy, etc) available in Python will help you achieve better overall performance because it's simply easier to understand and refactor the code. (This of course holds in any high-level vs low-level language comparison.)

Also, check out PyPy: it might provide a (sometimes significant) performance boost over CPython in the number crunching part without any additional effort required on your side (not to say that you couldn't or shouldn't optimize your code for the PyPy JIT compiler :)).

And then there's Cython which allows you to write normal Python mixing it with parts that will be converted directly to C code. This has the advantage of better maintainability and readability over Fortran (and C) with the performance of C, while enabling you to use most if not all of the high level Python constructs, as well as calling directly into pure Python code as well as pure C code/libraries (and probably Fortran code/libraries: http://www.sfu.ca/~mawerder/notes/calling_fortran_from_python.html). You can also just write the performance critical (CPU bound) parts of your code in Cython and call it directly from Python.

OTHER TIPS

In general, unless your particular compiler and available toolset does especially counter-productive things, one programming language is able to do IO as fast as another. In many programming languages, a naive approach may be sub-optimal - like all performance-related aspects of programming, this is something that is solved by appropriate design, and appropriate use of the available tools (such as parallel processing, use of buffered, threaded IO, for example).

Python isn't especially bad at IO, offers buffered IO and threading capabilities, and is easy to extend with C (and therefore probably not that hard to interact with Fortran). Python is likely to be a completely reasonable technology to incrementally replace parts of your codebase - indeed, if you can first make IO fast in python, you can probably compile an extension which ultimately calls your Fortran code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top