Question

I need to keep as much as I can of large file in the operating system block cache even though it's bigger than I can fit in ram, and I'm continously reading another very very large file. ATM I'll remove large chunk of large important file from system cache when I stream read form another file.

Was it helpful?

Solution

In a POSIX system like Linux or Solaris, try using posix_fadvise.

On the streaming file, do something like this:

posix_fadvise(fd, 0, 0, POSIX_FADV_SEQUENTIAL);
while( bytes > 0 ) {
  bytes = pread(fd, buffer, 64 * 1024, current_pos);
  current_pos += 64 * 1024;
  posix_fadvise(fd, 0, current_pos, POSIX_FADV_DONTNEED);
}

And you can apply POSIX_FADV_WILLNEED to your other file, which should raise its memory priority.

Now, I know that Windows Vista and Server 2008 can also do nifty tricks with memory priorities. Probably older versions like XP can do more basic tricks as well. But I don't know the functions off the top of my head and don't have time to look them up.

OTHER TIPS

Within linux, you can mount a filesystem as the type tmpfs, which uses available swap memory as backing if needed. You should be able to create a filesystem greater than your memory size and it will prioritize the contents of that filesystem in the system cache.

mount -t tmpfs none /mnt/point

See: http://lxr.linux.no/linux/Documentation/filesystems/tmpfs.txt

You may also benefit from the files swapiness and drop_cache within /proc/sys/vm

If you're using Windows, consider opening the file you're scanning through with the flag

FILE_FLAG_SEQUENTIAL_SCAN

You could also use

FILE_FLAG_NO_BUFFERING

for that file, but it imposes some restrictions on your read size and buffer alignment.

Some operating systems have ramdisks that you can use to set aside a segment of ram for storage and then mounting it as a file system.

What I don't understand, though, is why you want to keep the operating system from caching the file. Your full question doesn't really make sense to me.

Buy more ram (it's relatively cheap!) or let the OS do its thing. I think you'll find that circumventing the OS is going to be more trouble than it's worth. The OS will cache as much of the file as needed, until yours or any other applications needs memory.

I guess you could minimize the number of processes, but it's probably quicker to buy more memory.

mlock() and mlockall() respectively lock part or all of the calling process’s virtual address space into RAM, preventing that memory from being paged to the swap area.

(copied from the MLOCK(2) Linux man page)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top