Unconcatenating files

https://stackoverflow.com/questions/22411053

14-06-2023
|

Вопрос

I have a corrupted 7-zip archive that I am extracting manually using the method outlined by Igor Pavlov at this link. An intermediate result is a large file that is a bunch of files cat'ed together that must be separated manually. I understand that some file formats will need to be extracted manually by a human using discretion (text files, etc.) but many file formats encode the size of the file as part of the file itself (e.g. .zip). Furthermore, some files can be parsed and their size can be deduced with just a little information about the file format (e.g. .pdf). Let's say the large file consists of the following files concatenated together:

Key: <filename>(<contents>)

badfile(aaaaaaaaaaabbbbbbbbbcccccccdddddddd)   ->    zip1.zip(aaaaaaaaaaa)
                                                     badfile2(bbbbbbbbbcccccccdddddddd)

I am looking for a program that I can run on a large file (call it badfile) that can determine the type and size of the first logical file (let's say it's a .zip file) contained within and create a new file to hold the contents (e.g. zip1.zip since filenames are lost) and chop the file off the front of badfile. This would allow me to run the program in a loop to extract files with known types and/or pause and let the user handle the difficult cases. Does such a program exist? I know that the *nix command file(1) will do a lot of the work here, but there would be a lot of effort in encoding rules for sizing files (e.g. .pdf) that I would prefer to not duplicate.

Решение

I believe this question should be closed due to being off topic as it asks to find existing programs to solve the problem, but open bounty prevents close vote. However.

Does such a program exist?

Yes they exist is and are called data carving tools. Some commom ones include scalpel and foremost and PhotoRec

A list of other tools is avaliable here

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow