Pergunta

I made a program in C# where it processes about 30 zipped folders which have about 35000 files in total. My purpose is to read every single file for processing its information. As of now, my code extracts all the folders and then read the files. The problem with this process is it takes about 15-20 minutes for it to happen, which is a lot.

I am using the following code to extract files:

void ExtractFile(string zipfile, string path)
{
    ZipFile zip = ZipFile.Read(zipfile);
    zip.ExtractAll(path);
}

The extraction part is the one which takes the most time to process. I need to reduce this time. Is there a way I can read the contents of the files inside the zipped folder without extracting them? or if anyone knows any other way that can help me reduce the time of this code ?

Thanks in advance

Foi útil?

Solução

You could try reading each entry into a memory stream instead of to the file system:

ZipFile zip = ZipFile.Read(zipfile);
foreach(ZipEntry entry in zip.Entries)
{
    using(MemoryStream ms = new MemoryStream())
    {
        entry.Extract(ms);
        ms.Seek(0,SeekOrigin.Begin);
        // read from the stream
    }

}

Outras dicas

Maybe instead of extracting it to the hard disk, you should try read it without extraction, using OpenRead, then you would have to use the ZipArchiveEntry.Open method.

Also have a look at the CodeFluent Runtime tool, which claims to be improved for performances issues.

Try to break your responses into single await async methods, which started one by one if one of the responses is longer than 50 ms. http://msdn.microsoft.com/en-us/library/hh191443.aspx

If we have for example 10 executions which call one by one, in async/await we call our executions parallel, and operation will depend only from server powers.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top