Is is possible to write to different parts of the same file from multiple threads?

StackOverflow https://stackoverflow.com/questions/15477925

  •  24-03-2022
  •  | 
  •  

Frage

Can I write to different parts of the same file concurrently from multiple threads (on a typical PC)? I mean there's only one disk head, so the writes can be only performed in some order anyway i.e. not in parallel, right?

Edit:

I'm writing a program that sorts a large binary file but the majority of time is still spent on disk I/O, so I'm just wondering will I gain any extra speed by doing I/O in parallel.

War es hilfreich?

Lösung

There's nothing to stop you from having multiple threads writing to different parts of the same file.

I have a program that sorts a large binary file but the majority of time is still spent on disk I/O, so I'm just wondering will I gain any extra speed by doing I/O in parallel.

If the program is disk-bound, making it multithreaded (and still writing the same amount of data to the same disk) will not speed it up.

If we are talking about a traditional hard drive, sequential I/O is generally faster than I/O that involves moving the disk head back and forth. With this in mind, splitting the I/O across threads might even be counter-productive.

There are several avenues to explore as far as speeding things up:

  1. Reducing the amount of I/O (e.g. by employing a sorting algorithm that requires less I/O, or by doing more work in-memory);
  2. Improving I/O throughput, for example by using a faster drive.

Andere Tipps

It is possible on unix(-like) operating systems at least, presumably also on Windows, though file handling is somewhat different and may need specific file mode allow this (edit: see answer of bizzehdee for details).

On a running operating system, "file" is really a logical entity, some state of it stored to disk at any given time, but also some changes still only in kernel buffers. So, in a way, writing to file is no different from writing to block of shared memory, only API is different (and not even that if you use mmap).

But in short, just seek and write, old bytes in the file get overwritten. If two processes write on same bytes overlapping, I think end result is undefined, and in any case something, which should never happen in a correctly functioning system, and any programs doing this should have some mechanism to prevent overlapping writes.


About speed up: depends on what you do, really. If you just perform raw write, things will probably slow down on traditional spinning hard disk, or file may become fragmented more easily. On an SSD, there probably is no slow-down, but no speed-up either.

On the other hand, if your operation is CPU-bound, and you have multiple cores, and doing things in parallel will allow you to get higher total CPU usage, then processing different parts of same output file in parallel can speed up things, even a lot if there's lot of processing compared to bytes written to file.

you need to look at CreateFileEx and WriteFileEx and make use of lpOverlapped. This allows for async reading and/or writing from/to the same file at the same time in multiple threads.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx

Yes, it's possible, but as others say it's most unlikely to improve performance in the general case.

However, one of your statements isn't really correct.

there's only one disk head

First off, there's typically one per surface, so even a single-platter HDD will have two heads. Multi-platter ones have more, of course.

Some drives with multiple platters are also able to read or write to all platters at the same time. Some Fujitsu Eagle drives in the '80s did this, and were used for the first systems capable of recording uncompressed digital video signals. This isn't 'random access' of course, as all heads move together. I'm not sure if modern drives use this technique.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top