The problem you have here is that you cannot mmap()
the way the system call of the same name does; the syscall can map up to 2^64, FileChannel#map()
cannot map more than 2^30 reliably.
However, what you can do is wrap a FileChannel
into a class and create several "map ranges" covering all the file.
I have done "nearly" such a thing except more complicated: largetext. More complicated because I have to do the decoding process to boot, and the text which is loaded must be so into memory, unlike you who reads bytes. Less complicated because I have a define JDK interface to implement and you don't.
You can however use nearly the same technique using Guava and a RangeMap<Long, MappedByteBuffer>
.
I implement CharSequence
in this project above; I suggest that you implement a LargeByteMapping
interface instead, from which you can read whatever parts you want; or, well, whatever suits you. Your main problem will be to define that interface. I suspect what CharSequence
does is not what you want.
Meh, I may even have a go at it some day, largetext
is quite exciting a project and this looks like the same kind of thing; except less complicated, ultimately!
One could even imagine a LargeByteMapping
implementation where a factory would create such mappings with only a small part of that into memory and the rest written to a file; and such an implementation would also use the principle of locality: the latest queried part of the file into memory would be kept into memory for faster access.
See also here.
EDIT I feel some more explanation is needed here... A MappedByteBuffer
will NOT EAT HEAP SPACE!!
It will eat address space only; it is nearly the equivalent of a ByteBuffer.allocateDirect()
, except it is backed by a file.
And a very important distinction needs to be made here; all of the text above supposes that you are reading bytes, not characters!