Java大文件的磁盘IO性能
-
12-09-2019 - |
题
我有两个(2GB每个)的文件在我的硬盘和要相互比较:
- 复制的原件文件Windows资源管理器,大约需要。2-4分钟(即阅读和写作上的相同的物理和逻辑盘)。
- 阅读
java.io.FileInputStream
两次和比较字阵列上一个字每个字节的基础上花费的20多分钟。 java.io.BufferedInputStream
缓冲区为64kb,这些文件阅读块然后进行比较。比较所做的是紧张的环喜欢
int numRead = Math.min(numRead[0], numRead[1]); for (int k = 0; k < numRead; k++) { if (buffer[1][k] != buffer[0][k]) { return buffer[0][k] - buffer[1][k]; } }
我能做些什么来加速这个吗?是仁王应该更快,然后平流?是Java无法使用DMA/SATA技术和做了一些缓慢的OS-API电话,而不是?
编辑:
谢谢你的答案。我做了一些实验的基础上。为安德烈亚斯明
流或
nio
方法相差不多。
更重要的是正确的缓冲区的大小。
这证实了我自己的实验。作为本文件的读取在大块,甚至额外的缓冲区(BufferedInputStream
)不给任何东西。优化的比较是可能的,并且我得到了最好的结果的32倍的展开,但时间花费相比是比较小盘读,所以加速比是很小的。看起来像有什么我可以做;-(
解决方案
我尝试了三种不同方法的比较两个相同的3,8gb文件与缓冲区的大小之间的8kb和1MB。首先第一种方法的使用仅有的两个输入缓冲流
第二种方法采用一个线程池读取在两个不同的线,并比较了在第三个。这有稍微高的吞吐量在牺牲一个较高的中央处理利用。管理在线程池需要大量的开销那些短期运行的任务。
第三种方法采用仁王,如发布的laginimaineb
正如你可以看到,一般的方法差别不大。更重要的是正确的缓冲区的大小。
什么奇怪的是,我阅读1byte少使用螺纹。我不可能发现的错误艰难的。
comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB
代码用于:
import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;
public class FileCompare {
private static final int MIN_BUFFER_SIZE = 1024 * 8;
private static final int MAX_BUFFER_SIZE = 1024 * 1024;
private String fileName1;
private String fileName2;
private long start;
private long totalbytes;
@Before
public void createInputStream() {
fileName1 = "bigFile.1";
fileName2 = "bigFile.2";
}
@Test
public void compareTwoFiles() throws IOException {
System.out.println("comparing just with two streams");
int currentBufferSize = MIN_BUFFER_SIZE;
while (currentBufferSize <= MAX_BUFFER_SIZE) {
compareWithBufferSize(currentBufferSize);
currentBufferSize *= 2;
}
}
@Test
public void compareTwoFilesFutures()
throws IOException, ExecutionException, InterruptedException {
System.out.println("comparing with threads");
int myBufferSize = MIN_BUFFER_SIZE;
while (myBufferSize <= MAX_BUFFER_SIZE) {
start = System.currentTimeMillis();
totalbytes = 0;
compareWithBufferSizeFutures(myBufferSize);
myBufferSize *= 2;
}
}
@Test
public void compareTwoFilesNio() throws IOException {
System.out.println("comparing with nio");
int myBufferSize = MIN_BUFFER_SIZE;
while (myBufferSize <= MAX_BUFFER_SIZE) {
start = System.currentTimeMillis();
totalbytes = 0;
boolean wasEqual = isEqualsNio(myBufferSize);
if (wasEqual) {
printAfterEquals(myBufferSize);
} else {
Assert.fail("files were not equal");
}
myBufferSize *= 2;
}
}
private void compareWithBufferSize(int myBufferSize) throws IOException {
final BufferedInputStream inputStream1 =
new BufferedInputStream(
new FileInputStream(new File(fileName1)),
myBufferSize);
byte[] buff1 = new byte[myBufferSize];
final BufferedInputStream inputStream2 =
new BufferedInputStream(
new FileInputStream(new File(fileName2)),
myBufferSize);
byte[] buff2 = new byte[myBufferSize];
int read1;
start = System.currentTimeMillis();
totalbytes = 0;
while ((read1 = inputStream1.read(buff1)) != -1) {
totalbytes += read1;
int read2 = inputStream2.read(buff2);
if (read1 != read2) {
break;
}
if (!Arrays.equals(buff1, buff2)) {
break;
}
}
if (read1 == -1) {
printAfterEquals(myBufferSize);
} else {
Assert.fail("files were not equal");
}
inputStream1.close();
inputStream2.close();
}
private void compareWithBufferSizeFutures(int myBufferSize)
throws ExecutionException, InterruptedException, IOException {
final BufferedInputStream inputStream1 =
new BufferedInputStream(
new FileInputStream(
new File(fileName1)),
myBufferSize);
final BufferedInputStream inputStream2 =
new BufferedInputStream(
new FileInputStream(
new File(fileName2)),
myBufferSize);
final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);
if (wasEqual) {
printAfterEquals(myBufferSize);
} else {
Assert.fail("files were not equal");
}
inputStream1.close();
inputStream2.close();
}
private boolean isEqualsParallel(int myBufferSize
, final BufferedInputStream inputStream1
, final BufferedInputStream inputStream2)
throws InterruptedException, ExecutionException {
final byte[] buff1Even = new byte[myBufferSize];
final byte[] buff1Odd = new byte[myBufferSize];
final byte[] buff2Even = new byte[myBufferSize];
final byte[] buff2Odd = new byte[myBufferSize];
final Callable<Integer> read1Even = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream1.read(buff1Even);
}
};
final Callable<Integer> read2Even = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream2.read(buff2Even);
}
};
final Callable<Integer> read1Odd = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream1.read(buff1Odd);
}
};
final Callable<Integer> read2Odd = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream2.read(buff2Odd);
}
};
final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
public Boolean call() throws Exception {
return Arrays.equals(buff1Odd, buff2Odd);
}
};
final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
public Boolean call() throws Exception {
return Arrays.equals(buff1Even, buff2Even);
}
};
ExecutorService executor = Executors.newCachedThreadPool();
boolean isEven = true;
Future<Integer> read1 = null;
Future<Integer> read2 = null;
Future<Boolean> isEqual = null;
int lastSize = 0;
while (true) {
if (isEqual != null) {
if (!isEqual.get()) {
return false;
} else if (lastSize == -1) {
return true;
}
}
if (read1 != null) {
lastSize = read1.get();
totalbytes += lastSize;
final int size2 = read2.get();
if (lastSize != size2) {
return false;
}
}
isEven = !isEven;
if (isEven) {
if (read1 != null) {
isEqual = executor.submit(oddEqualsArray);
}
read1 = executor.submit(read1Even);
read2 = executor.submit(read2Even);
} else {
if (read1 != null) {
isEqual = executor.submit(evenEqualsArray);
}
read1 = executor.submit(read1Odd);
read2 = executor.submit(read2Odd);
}
}
}
private boolean isEqualsNio(int myBufferSize) throws IOException {
FileChannel first = null, seconde = null;
try {
first = new FileInputStream(fileName1).getChannel();
seconde = new FileInputStream(fileName2).getChannel();
if (first.size() != seconde.size()) {
return false;
}
ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
int firstRead, secondRead;
while (first.position() < first.size()) {
firstRead = first.read(firstBuffer);
totalbytes += firstRead;
secondRead = seconde.read(secondBuffer);
if (firstRead != secondRead) {
return false;
}
if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
return false;
}
}
return true;
} finally {
if (first != null) {
first.close();
}
if (seconde != null) {
seconde.close();
}
}
}
private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
if (first.limit() != second.limit() || length > first.limit()) {
return false;
}
first.rewind();
second.rewind();
for (int i = 0; i < length; i++) {
if (first.get() != second.get()) {
return false;
}
}
return true;
}
private void printAfterEquals(int myBufferSize) {
NumberFormat nf = new DecimalFormat("#.00");
final long dur = System.currentTimeMillis() - start;
double seconds = dur / 1000d;
double megabytes = totalbytes / 1024 / 1024;
double rate = (megabytes) / seconds;
System.out.println("I was equal, even after " + totalbytes
+ " bytes and reading for " + dur
+ " ms (" + nf.format(rate) + "MB/sec * 2)" +
" with a buffer size of " + myBufferSize / 1024 + " kB");
}
}
其他提示
与这种大型文件, 你会得到更好的性能 java。仁王.
此外,阅读的单字节与java流可能会非常缓慢。使用一个字阵列(2-6K因素,从我自己的经验,情况因人而异,因为它似乎平台/应用程序的具体)将大大提高你的读性流。
阅读和书写的文件Java可以是一样快速。你可以使用 FileChannels.作为比较的文件,显然这将需要大量的时间比较节字节 这里有一个例子使用FileChannels和ByteBuffers(可能进一步优化):
public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException {
FileChannel firstIn = null, secondIn = null;
try {
firstIn = new FileInputStream(firstPath).getChannel();
secondIn = new FileInputStream(secondPath).getChannel();
if (firstIn.size() != secondIn.size())
return false;
ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
int firstRead, secondRead;
while (firstIn.position() < firstIn.size()) {
firstRead = firstIn.read(firstBuffer);
secondRead = secondIn.read(secondBuffer);
if (firstRead != secondRead)
return false;
if (!buffersEqual(firstBuffer, secondBuffer, firstRead))
return false;
}
return true;
} finally {
if (firstIn != null) firstIn.close();
if (secondIn != null) firstIn.close();
}
}
private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
if (first.limit() != second.limit())
return false;
if (length > first.limit())
return false;
first.rewind(); second.rewind();
for (int i=0; i<length; i++)
if (first.get() != second.get())
return false;
return true;
}
以下是一篇好文章上的相对优点的不同的方法阅读一个文件。可以某些使用:
修改后您的仁王比较功能我得到以下结果。
I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB
注:这意味着该文件正在读率为37MB/s
运行同样的事情在一个速度更快的驱动器
I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB
注:这意味着该文件正在读的速度74.8MB/s
private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
if (first.limit() != second.limit() || length > first.limit()) {
return false;
}
first.rewind();
second.rewind();
int i;
for (i = 0; i < length-7; i+=8) {
if (first.getLong() != second.getLong()) {
return false;
}
}
for (; i < length; i++) {
if (first.get() != second.get()) {
return false;
}
}
return true;
}
你可以看看 太阳第I/O调谐 (虽然已经有点过时),也许你可以找到相似之处的实例有和你的代码。还有一个看看 java。仁王 包,其中包含快I/O元,比java。io.博士多布斯杂志都有一个相当不错的文章 高性能IO使用java。仁王.
如果是这样,还有进一步的实例和调整的技巧有哪些应该能够帮助你的速度你的代码。
此外,该阵列类 方法进行比较字阵列 建立在,也许这还可以使用,以使事情更加快速和清除了你的循环一点。
一个更好的比较,试图复制的两个文件的一次。硬盘驱动器可读一个文件的效率远远高于阅读的两个(如头已经来回移动阅读) 一种方式来减少这是为了使用更大的缓冲区,例如16MB。与ByteBuffer。
与ByteBuffer你可以比较的8个字节的时间比较长,价值观与getLong()
如果你Java是有效的,大部分工作是在磁盘/磁盘操作系统的阅读和写作,因此它不应该远远低于使用的任何其他语言(如磁盘/磁盘操作系统的瓶颈)
不要假设是缓慢的,直到你确定它不是一个错误。
我发现了很多的文章链接到这个职位真是出日(还有一些非常有见地的东西太)。有一些文章链接,从2001年的信息是有疑问的。马丁*汤普森的机械同情写了很多关于这个于2011年。请参考为什么他写了背景和理论的这一点。
我们发现,仁王或不仁王有很少做的表现。这是更多的有关大小出缓冲区(阅读字节的数组在那一个)。仁王是没有魔法让它跑的快速网络规模酱。
我能把马丁的例子,并使用1.0时代使用的输出流,并使其尖叫。仁王是太快,但最大的指标只是大小的缓冲器输出不你是否使用仁王当然,除非你使用的存储器映仁王然后它事项。:)
如果你想要最新的权威信息,请参阅马丁博客:
http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html
如果你想看看如何仁王不会让那么多的差别(因为我能够编写的例子使用常规IO,更快地)查看这个:
http://www.dzone.com/links/fast_java_io_nio_is_always_faster_than_fileoutput.html
我已经测试了我的假设上的新的窗台笔记本电脑的快速硬盘,我的它与SSD,EC2超,并EC2 4倍大与刷爆了IOP/高速I/O(并很快在一个大的圆盘NAS光盘阵列),因此它的工作(有一些问题,它对于较小的EC2的实例,但是如果你关心的表现...你要用一个小EC2实例?).如果使用真正的硬件,在我的测试迄今为止,传统IO总是获胜。如果您使用的高/IO EC2,那么这也是一个明显的胜利者。如果你使用下电EC2情况下,仁王能胜利。
没有替代的基准。
无论如何,我不是专家,我只是做了一些经验测试使用的框架,长官马丁*汤普森写了在他的博客。
我把这个下一步骤和使用 文件。newInputStream (从JAVA7) TransferQueue 创建一个配方制作的Java I/O尖叫(甚至在小EC2实例)。配方可以找到在这个文件为文(https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder).这将允许我使用一个传统使用的输出流,但有一些作品以及在更小的EC2实例。(我的主要作者的福音。但我接受新的作者。对付吸。0美元,每小时。但好消息是,我可以双倍薪酬,只要你喜欢.)
我2美分。
看看这看看为什么 TransferQueue 是重要的。 http://php.sabscape.com/blog/?p=557
的重要经验:
- 如果你关心性能永远,永远,永远不用 BufferedOutputStream.
- 仁王并不总是平等的性能。
- 缓冲区的大小最重要的。
- 回收利用缓冲区,用于高速写是至关重要的。
- GC能/将不会爆你的表现于高速写。
- 你必须有一些机构重用花费缓冲区。
DMA/SATA是硬件/低水平techlonogies和不可见的任何编程语言中的任何责任。
对于存储器映输入/输出你应该使用java。仁王,我相信。
你确定不阅读这些文件通过一个字?这将是浪费资源,我建议这样做块,每个区块应该是这样的64兆字节,以尽量减少求。
尝试的设置上的缓冲区输入流了几个兆字节。