Question

So I have this code and I'm trying to upload large files as per https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/Storage/Object.md to Rackspace:

$src_path = 'pathtofile.zip'; //about 700MB
$md5_checksum = md5_file($src_path); //result is f210775ccff9b0e4f686ea49ac4932c2
$trans_opts = array(
      'name' => $md5_checksum,
      'concurrency' => 6,
      'partSize'    => 25000000
 );
$trans_opts['path'] = $src_path;
$transfer = $container->setupObjectTransfer($trans_opts);
$response = $transfer->upload();

Which allegedly uploads the file just fine

However when I try to download the file as recommended here https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/USERGUIDE.md:

$name = 'f210775ccff9b0e4f686ea49ac4932c2';
$object = $container->getObject($name);
$objectContent = $object->getContent();
$pathtofile = 'destinationpathforfile.zip';
$objectContent->rewind();
$stream = $objectContent->getStream();
file_put_contents($pathtofile, $stream);
$md5 = md5_file($pathtofile);

The result of md5_file ends up being different from 'f210775ccff9b0e4f686ea49ac4932c2'....moreover the downloaded zip ends up being unopenable/corrupted

What did I do wrong?

Was it helpful?

Solution

It's recommended that you only use multipart uploads for files over 5GB. For files under this threshold, you can use the normal uploadObject method.

When you use the transfer builder, it segments your large file into smaller segments (you provide the part size) and concurrently uploads each one. When this process has finished, a manifest file is created which contains a list of all these segments. When you download the manifest file, it collates them all together, effectively pretending to be the big file itself. But it's just really an organizer.

To get back to answering your question, the ETag header of a manifest file is not calculated how you may think. What you're currently doing is taking the MD5 checksum of the entire 700MB file, and comparing it against the MD5 checksum of the manifest file. But these aren't comparable. To quote the documentation:

the ETag header is calculated by taking the ETag value of each segment, concatenating them together, and then returning the MD5 checksum of the result.

There are also downsides to using this DLO operation that you need to be aware of:

End-to-end integrity is not assured. The eventual consistency model means that although you have uploaded a segment object, it might not appear in the container list immediately. If you download the manifest before the object appears in the container, the object will not be part of the content returned in response to a GET request.

If you think there's been an error in transmission, perhaps it's because a HTTP request failed along the way. You can use retry strategies (using the backoff plugin) to retry failed requests.

You can also turn on HTTP logging to check every network transaction to help with debugging. Be careful, though, using the above with echo out the HTTP request body (>25MB) into STDOUT. You might want to use this instead:

use Guzzle\Plugin\Log\LogPlugin;
use Guzzle\Log\ClosureLogAdapter;

$stream = fopen('php://output', 'w');

$logSubscriber = new LogPlugin(new ClosureLogAdapter(function($m) use ($stream) {
    fwrite($stream, $m . PHP_EOL);
}), "# Request:\n{url} {method}\n\n# Response:\n{code} {phrase}\n\n# Connect time: {connect_time}\n\n# Total time: {total_time}", false);

$client->addSubscriber($logSubscriber);

As you can see, you're using a template to dictate what's outputted. There's a full list of template variables here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top