LZMA SDK decompress for iOS (xcode) using too much RAM

https://stackoverflow.com/questions/12575874

03-07-2021
|

Question

I am trying to use the LZMA SDK in an iPhone/iPad app, my starting point was the LZMA example project for iPhone provided by Mo Dejong, available here: https://github.com/jk/lzmaSDK Original was here: http://www.modejong.com/iOS/lzmaSDK.zip (I tried both and I get the same result from both).

The problem is that the extract uses as much RAM as the .7z contains uncompressed. In other words, say I have a 40MB compressed file, the uncompressed file is a binary sqlite DB that is about 250MB, it will slowly use up more and more memory as it uncompresses the file all the way up to 250MB. This will crash an iPad1 or anything before iPhone4 (256MB RAM). I have a feeling a lot of people will eventually run into this same problem, so a resolution now could help a lot of developers.

I originally created the .7z file on a PC using windows based 7-zip (latest version) and a 16MB dictionary size. It should only require 18MB of RAM to uncompress (and that is the case when testing on a PC looking at task manager). I also tried creating the archive using keka (the open source mac archiver), it did not resolve anything, although I can confirm that keka itself only uses 19MB of ram during its extract of the file on a mac which is what I would expect. I guess the next step would be to compare the source code of Keka to the source code of the LZMA SDK.

I played around with different dictionary sizes and other settings when creating the .7z file but nothing helped. I also tried splitting my single binary file into 24 smaller pieces before compressing, but that also did not help (still uses over 250MB of RAM to extract the 24 pieces).

Note that the ONLY change I made to the original code was to use a bigger .7z file. Also note that it does immediately free up the RAM as soon as the extract is finished, but that doesn't help. I feel like it is not freeing up RAM as it extracts like it should, or it is putting the entire contents into RAM until the very end when it is done and only then moving it out of RAM. Also, if I try to extract the same exact file using a mac app, while running instruments, I do not see the same behavior (StuffIt Expander for example maxed out at around 60MB of RAM while extracting the file, Keka, the open source mac archiver maxed out at 19MB of RAM).

I'm not much of a mac/xcode/objective-c developer (yet) so any help with this would be greatly appreciated. I could resort to using zip or rar instead, but I get far superior compression with LZMA so if at all possible I want to stick with this solution but obviously I need to get it to work without crashing.

Thanks!

Screenshot of Instruments.app profiling the example app

Solution

Igor Pavlov, author of 7zip, emailed me, he basically said the observations I made in the original question are a known limitation of the c version of the SDK. The C++ version does not have this limitation. Actual quote:

"7-Zip uses another multithreaded decoder written in C++. That C++ .7z decoder doesn't need to allocate RAM block for whole solid block. Read also this thread:

http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/5655623 "

So until someone fixes the SDK for iOS, the workaround is to:

1) Decide what RAM limit you want to have for file decompression operations.

2) Any SINGLE file in your archive that exceeds limit from 1 above, must be split, you can do this using any binary spliter app such as splits: http://www.fourmilab.ch/splits/

3) After your files are ready, create the 7z file using the dictionary/block size options as described by MoDJ in his answer, for example with 24 meg limit: 7za a -mx=9 -md=24m -ms=24m CompressedFile.7z SourceFiles*

4) In your iOS app, after you decompress the files, determine what files had been split, and concatenate them back together again. The code for this is not all that complicated (I assume the naming convention that splits.exe uses, which is file.001, file.002, etc.)

    if(iParts>1)
    {
        //If this is a multipart binary split file, we must combine all of the parts before we can use it
        NSString *finalfilePath = whateveryourfinaldestinationfilenameis
        NSString *splitfilePath = [finalfilePath stringByAppendingString:@".001"];

        NSFileHandle *myHandle;
        NSFileManager *fileManager = [NSFileManager defaultManager];
        NSError *error;

        //If the target combined file exists already, remove it
        if ([fileManager fileExistsAtPath:finalfilePath]) 
        {
            BOOL success = [fileManager removeItemAtPath:finalfilePath error:&error];
            if (!success) NSLog(@"Error: %@", [error localizedDescription]);
        }

        myHandle  = [NSFileHandle fileHandleForUpdatingAtPath:splitfilePath];
        NSString *nextPart;
        //Concatenate each piece in order
        for (int i=2; i<=iParts; i++) {
            //Assumes fewer than 100 pieces
            if (i<10) nextPart = [splitfilePath stringByReplacingOccurrencesOfString:@".001" withString:[NSString stringWithFormat:@".00%d", i]];
            else nextPart = [splitfilePath stringByReplacingOccurrencesOfString:@".001" withString:[NSString stringWithFormat:@".0%d", i]];
            NSData *datapart = [[NSData alloc] initWithContentsOfFile:(NSString *)nextPart];
            [myHandle seekToEndOfFile];
            [myHandle writeData:datapart];
        }    
        [myHandle closeFile];
        //Rename concatenated file
        [fileManager moveItemAtPath:splitfilePath toPath:finalfilePath error:&error];
    }

OTHER TIPS

Okay, so this is a tricky one. The reason you are running into problems is because iOS does not have virtual memory while your desktop system does. The lzmaSDK library is written in such a way that it assumes your system has plenty of virtual memory for decompression. You will not see problems running on the desktop. Only when allocating large amounts of memory to decompress on iOS will you run into issues. It would be best to address this by rewriting the lzma SDK so that it makes better use of mapped memory directly, but that is not a trivial task. Here is how to work around the problem.

Using 7za

There are actually 2 command line options you will want to pass to the 7zip archive program in order to segment files into smaller chunks. I am going to suggest that you just use the 24 meg size that I ended up using since it was a decent space/mem tradeoff. Here is the command line that should do the trick, note that in this example I have big movie files named XYZ.flat and I want to compress then together in an archive.7z file:

7za a -mx=9 -md=24m -ms=24m Animations_9_24m_NOTSOLID.7z *.flat

If you compare this segmented file to a version that does not break the file into segments, you will see that the file gets a little bigger when segmented:

$ ls -la Animations_9_24m.7z Animations_9_24m_NOTSOLID.7z
-rw-r--r--  1 mo  staff  8743171 Sep 30 03:01 Animations_9_24m.7z
-rw-r--r--  1 mo  staff  9515686 Sep 30 03:21 Animations_9_24m_NOTSOLID.7z

So, segmenting reduces compression by about 800K, but it is not that big a loss because now the decompression routines will not attempt to allocate a bunch of memory. The decompression memory usage is now limited to a 24 meg block, which iOS can handle.

Double check your results by printing out the header info of the compressed file:

$ 7za l -slt Animations_9_24m_NOTSOLID.7z

Path = Animations_9_24m_NOTSOLID.7z
Type = 7z
Method = LZMA
Solid = +
Blocks = 7
Physical Size = 9515686
Headers Size = 1714

Note the "Blocks" element in the above output, it indicates that data has been segmented into different 24 meg blocks.

If you compare the segmented file info above to the output without the -ms=24m argument, you would see:

$ 7za l -slt Animations_9_24m.7z

Path = Animations_9_24m.7z
Type = 7z
Method = LZMA
Solid = +
Blocks = 1
Physical Size = 8743171
Headers Size = 1683

Note the "Blocks" value, you don't want just 1 huge block since that will attempt to allocate a huge amount of memory when decompressing on iOS.

I've run into the same problem, but found a much more practical workaround:

use the CPP interface of LZMA SDK. It uses only very little memory and does not suffer from the memory consumption problem as the C interface does (as tradergordo already correctly said as well).
have a look at LZMAAlone.cpp, strip it off anything unneccessary (like encoding, 7-zip file format stuff, and btw. encoding will also still require big memory) and create a tiny header file for your CPP LZMA decompressor, e.g.:

extern "C" int extractLZMAFile(const char *filePath, const char *outPath);

for very large files (like 100MB+ db files) I then use LZMA decompression to compress this file. Of course, since LZMA alone does not have any file container, you need to give the name of the decompressed file
because I don't have full 7Z support, I use tar as container together with lzma compressed files. There is a tiny iOS untar at https://github.com/mhausherr/Light-Untar-for-iOS

Unfortunately I can't provide any sources, even though I'd like to.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow