سؤال

I have an ftp server that contains all of my tar files, those tar files are as big as 500MB+, and they are too many and all I needed to do is to get a single file from a TAR that contains multiple files which becomes 500MB+.

My initial idea is to download each tar files and get the single file I needed, but that seems to be inefficient.

I'm using Python as Programming language.

هل كانت مفيدة؟

المحلول

This answer is not specific to python, because the problem is not specific to python: In theory you could read the part of the Tar-file where your data are. With FTP (and also with pythons ftplib) this is possible by doing first a REST command to specify the start position in the file, then RETR to start the download of the data and after you got the amount of data you need you can close the data connection.

But, Tar is a file format without a central index, e.g. each file in Tar is prefixed with a small header with information about name, size and other. So to get a specific file you must read the first header, check if it is the matching file and if it is not you skip the size of the unwanted file and try with the next one. With lots of smaller files in Tar this will be less effective than downloading the complete file (or at least downloading up to the relevant part - you might parse the file while downloading) because all these new data connections for each read cause lots f overhead. But if you have large files in the Tar this might work.

But, you are completely out of luck if it is not a TAR (*.tar), but a TGZ (*.tgz or *.tar.gz) file. These are compressed Tar-files and to get any part of the file you would need to decompress everything you have before. So in this case there is no way around downloading the file or at least downloading everything up to the relevant part.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top