Question

I was recently downvoted (which only bugged me a little :) ) for an answer I gave to this question. The person offered no explanation for the down vote which started me thinking: "Why would you avoid producing intermediate files?" Especially in a language like Python where File IO is laughably easy.

There seemed to be consensus that it was a bad idea, but I know for a fact that intermediate files are used regularly in practice. I worked for a very well respected research firm (let's just say S.O. wouldn't exist without this firm) where it was assumed that your programs would produce files as output. We did this because if your program indeed deserved to be a standalone program then it would need debuggable output and some way of passing its output between processes that could later be examined in case we discovered an error in our output further downstream.

Is it considered bad practice (in cases like the question linked above) to use intermediate files? Why?

Was it helpful?

Solution

One problem with intermediate files happens when multi-threading.

If Clients C1 and C2 are handled simultaneously by server process S (which may or not have forked into seperate processes, used threads, or whatever concurrency system..), you may get weird issues when both try to create the same intermediate file.

I believe one of Unix philosophies is that all programs should act as filters, however this doesn't necessarily mean creating files on the disk, and using intermediate files leads to unwieldly behaviour in my opinion. One should also treat the disk as a last resort and only use it for storing/retrieving data that should be available after powering off the computer, and maybe even take care to allow programs to run on read-only media.

OTHER TIPS

Well, there are some issues when you use files, especially there may be many unexpected failures while accessing or creating the files. The following listed are all the issues that I personally have experienced.

1) The file location is on the remote machine and the network is down. (NFS mounted).
2) There is not enough free space while creating the file.
3) In between the process the user press Ctrl-C to cancel the process the file is not deleted.
4) The file is mounted on the NFS and the network is slow.
5) The folder in which file was created was a soft link and the original link was deleted.

But still we have to use file because there are hardly any options while working in bash. But in C,C++ i think disk access should be considered as the last resort. Program producing files as output is ok, if that is the only way to communicate with the user. But atleast for intermediate savings use of disk files should be minimized.

If you create temporary files properly (with setting platform-specific 'temporary' flag meaning do not flush cache to disk when no urgent need) they are perfectly good if task requires them.

There are almost no things in IT that you can't use while having a good reason to. :-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top