Question

The question might sound strange because I know I enforce a strange situation> It came up by accident (a bug one might say) and I even know hot to avoid it, so please skip that part.

I would really like to understand the behaviour I see.

The point of the function is to add all files with a given prefix in a directory to an archive. I noticed that even despite a "bug", the program works correctly (sic!). I wanted to understand why.

The code is fairly simple so I allow myself to post whole function:

def pack(prefix, custom_meta_files = []):
  postfix = 'tgz'  
  if prefix[-1] != '.':
    postfix = '.tgz'

  archive = tarfile.open(prefix+postfix, "w:gz")
  files = filter(lambda path: path.startswith(prefix), os.listdir())
  #print('files: {0}'.format(list(files)))

  for file in files:
    print('packing `{0}`'.format(file))
    archive_name = file[len(prefix):]   #skip prefix + dot
    archive.add(file, archive_name)

  not_doubled_metas = set(custom_meta_files) - set(archive.getnames())
  print('metas to add: {0}'.format(not_doubled_metas))
  for meta in not_doubled_metas:
    print('packing `{0}`'.format(meta))
    archive.add(meta)
  print('contents:{0}'.format(archive.getnames()))

As one can notice I create the archive with the prefix, and then I create a list of files to pack by by listing everything in cwd and filter it via the lambda. Naturally the archive passes the filter. There is also a snippet to add fixed files if the names do not overlap, although it is not important I think.

So the output from such run is e.g:

packing `ga_run.seq_niche.N30.1.bt0_5K.params`
packing `ga_run.seq_niche.N30.1.bt0_5K.stats`
packing `ga_run.seq_niche.N30.1.bt0_5K.tgz`
metas to add: {'stats.meta'}
packing `stats.meta`
contents:['params', 'stats', 'stats.meta']

So the script tried adding itself, however it does not appear in the final contents. I do not know what is the expected behaviour, but there is no warning at all and the documentation does not mention anything. I read the parts about methods to add members and used search for itself and same name.

I would assume it is automatically skipped, but I don't know how to acutally check it. I would personally expect to add a zero length file as member, however I understand skipping as I makes more sense actually.

Question Is it a desired behaviour in tarfile.add() to ignore adding the archive to itself? Where is it said?

Was it helpful?

Solution

Scanning the tarfile.py code from 3.2 to 2.4 they all have code similar to:

# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
    self._dbg(2, "tarfile: Skipped %r" % name)
    return
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top