I think your slowness is coming from creating new files, not actual transfer. I believe that creating a file is a synchronous operation in Linux: the system call will not return until the file has been created and the directory updated. This suggests a couple of things you can do:
- Use multiple writer threads with a single reader thread. The reader thread will read data from the source file into a
byte[]
, then create aRunnable
that writes the output file from this array. Use a threadpool with lots of threads -- maybe 100 or more -- because they'll be spending most of their time waiting for thecreat
to complete. Set the capacity of this pool's inbound queue based on the amount of memory you have: if your files are 10k in size, then a queue capacity of 1,000 seems reasonable (there's no good reason to allow the reader to get too far ahead of the writers, so you could even go with a capacity of twice the number of threads). - Rather than NIO, use basic
BufferedInputStream
s andBufferedOutputStreams
. Your problem here is syscalls, not memory speed (the NIO classes are designed to prevent copies between heap and off-heap memory).
I'm going to assume that you already know not to attempt to store all the files into a single directory. Or even store more than a few hundred files in one directory.
And as another alternative, have you considered S3 for storage? I'm guessing that its bucket keys are far more efficient than actual directories, and there is a filesystem that lets you access buckets as if they were files (haven't tried it myself).