Question

I have a tar file which has number of files within it. I need to write a python script which will read the contents of the files and gives the count o total characters, including total number of letters, spaces, newline characters, everything, without untarring the tar file.

Was it helpful?

Solution

you can use getmembers()

>>> import  tarfile
>>> tar = tarfile.open("test.tar")
>>> tar.getmembers()

After that, you can use extractfile() to extract the members as file object. Just an example

import tarfile,os
import sys
os.chdir("/tmp/foo")
tar = tarfile.open("test.tar")
for member in tar.getmembers():
    f=tar.extractfile(member)
    content=f.read()
    print "%s has %d newlines" %(member, content.count("\n"))
    print "%s has %d spaces" % (member,content.count(" "))
    print "%s has %d characters" % (member, len(content))
    sys.exit()
tar.close()

With the file object "f" in the above example, you can use read(), readlines() etc.

OTHER TIPS

you need to use the tarfile module. Specifically, you use an instance of the class TarFile to access the file, and then access the names with TarFile.getnames()

 |  getnames(self)
 |      Return the members of the archive as a list of their names. It has
 |      the same order as the list returned by getmembers().

If instead you want to read the content, then you use this method

 |  extractfile(self, member)
 |      Extract a member from the archive as a file object. `member' may be
 |      a filename or a TarInfo object. If `member' is a regular file, a
 |      file-like object is returned. If `member' is a link, a file-like
 |      object is constructed from the link's target. If `member' is none of
 |      the above, None is returned.
 |      The file-like object is read-only and provides the following
 |      methods: read(), readline(), readlines(), seek() and tell()

An implementation of the methods mentioned by @stefano-borini Access a tar archives member via file name like so

#python3
myFile = myArchive.extractfile( 
    dict(zip(
        myArchive.getnames(), 
        myArchive.getmembers()
    ))['path/to/file'] 
).read()`

Credits:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top