I want to create a hash of a file which size minimum 5Mb and can extend to 1-2 Gb. Now tough choice arise in between these two methods although they work exactly same.

Method 1: sha1_file($file)
Method 2: sha1(file_get_contents($file))

I have tried with 10 Mb but there is no much difference in performance. But on higher data scale. What's better way to go?

有帮助吗?

解决方案

Use the most high-level form offered unless there is a compelling reason otherwise.

In this case, the correct choice is sha1_file. Because sha1_file is a higher-level function that only works with files. This 'restriction' allows it to take advantage of the fact that the file/source can be processed as a stream1: only a small part of the file is ever read into memory at a time.

The second approach guarantees that 5MB-2GB of memory (the size of the file) is wasted/used as file_get_contents reads everything into memory before the hash is generated. As the size of the files increase and/or system resources become limited this can have a very detrimental effect on performance.


1 The source for sha1_file can be found on github. Here is an extract showing only lines relevant to stream processing:

PHP_FUNCTION(sha1_file)
{       
    stream = php_stream_open_wrapper(arg, "rb", REPORT_ERRORS, NULL);
    PHP_SHA1Init(&context);    
    while ((n = php_stream_read(stream, buf, sizeof(buf))) > 0) {
        PHP_SHA1Update(&context, buf, n);
    }    
    PHP_SHA1Final(digest, &context);    
    php_stream_close(stream);
}

By using higher-level functions, the onus of a suitable implementation is placed on the developers of the library. In this case it allowed the use of a scaling stream implementation.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top