Question

I have the trial account in the azure blob storage. I try to upload 100000 generated files from my local machine. The operation already have duration over 17 hours and uploaded only ~77000 files. All files created by a simple bash-script:

for i in {1..100000}
do
    echo $i
    echo $i > $1\\$i.txt
done

Code for the uploading:

using(var stream = File.OpenWrite(textBoxManyUploadFileName.Text))
using(var writer = new StreamWriter(stream)) {
    foreach(var file in Directory.GetFiles(textBoxManyUploadFrom.Text)) {
        Guid id = Guid.NewGuid();
        storage.StoreFile(file, id, ((FileType)comboBoxManyUploadTypes.SelectedItem).Number);
        writer.WriteLine("{0}={1}", id, file);
    }
}

public void StoreFile(Stream stream, Guid id, string container) {
    try {
        var blob = GetBlob(id, container);
        blob.UploadFromStream(stream);
    } catch(StorageException exception) {
        throw TranslateException(exception, id, container);
    }
}

public void StoreFile(string filename, Guid id, int type = 0) {
    using(var stream = File.OpenRead(filename)) {
        StoreFile(stream, id, type);
    }
}

CloudBlob GetBlob(Guid id, string containerName) {
    var container = azureBlobClient.GetContainerReference(containerName);
    if(container.CreateIfNotExist()) {
        container.SetPermissions(new BlobContainerPermissions {
            PublicAccess = BlobContainerPublicAccessType.Container
        });
    }
    return container.GetBlobReference(id.ToString());
}

The first 10000 files have bean uploaded by 20-30 minutes then the speed decreased. I think it may due to the fact that the file names are GUID and Azure tries to build the clustered index. How to speed up? What is the problem?

Was it helpful?

Solution

To upload many small files, you should use multiple threads. You can use BeginUploadFromStream or Parallel.ForEach for instance.

OTHER TIPS

One more thing I noticed in your code is that you're calling GetBlob() function in your StoreFile() function which in turn calls CreateIfNotExist() function on your blob container. Please note that this function also result in a call to Storage Service thus adding delay in your upload process (not to mention you're also charged for a storage transaction each time you call this function).

I would recommend that you call this function just once before starting your blob upload.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top