Question

I've been playing around with the BFG Repo-Cleaner tool in order to clean up the history of a git repo by removing (temporary / large) files from several places in the directory hierarcy...

i.e. /root/test/a.txt /root/test2/a.txt

Now i'd like to remove all references of the 'test/a.txt' but keep the /test2/a.txt version..

Is there any way to cleanly remove that using BFG? (as mentioned on https://help.github.com/articles/remove-sensitive-data)

Since the repo has a fairly large history ( 10K commits ) BFG really is alot faster than the other methods i've seen so far...

Was it helpful?

Solution

I'm the developer of The BFG, and I'm glad you've been finding it useful and fast. Part of the special-sauce that makes the BFG so fast is that it's path-independent - so you can't say directly say something like --delete /root/test/a.txt. Adding some support for path-dependent action is something I'm thinking about, but I don't want it to adversely affect performance.

The key question when cleaning your repo is: What are you trying to achieve, out of these two options:

  • Reduction in Git repository size
  • Removal of private data

From your question, it sounds like your only aim is the first one, to reduce Git repository size. If /root/test/a.txt is fairly small- ie comparable in size to the rest of the legitimate files in your repository - you can't really use --strip-blobs-bigger-than X to get rid of it, as it would remove too many of your other regular files. But if that is the case, I would just relax, and let it go - it's not costing you much storage space compared to the entirety of your repo.

If /root/test/a.txt is big enough to bother you, you can probably just use --strip-blobs-bigger-than X to get rid of it - remember that The BFG protects all files in your current commit (or even more branches if you use --protect-blobs-from <refs>) - so legitimate big files that you're currently using won't get touched.

If you really want to get rid of this poor innocuous file, but don't want to filter on size, there are two BFG-supported options:

Use --delete-folders test

...which will delete the entire folder /root/test/ (and all other folders called 'test'), but not /root/test2/. Not much use if there are other things in /root/test/ that you want to keep.

Use --strip-blobs-with-ids <blob-ids-file>

...you have to look up all the Git blob-ids there have ever been for /root/test/a.txt, which you can do with a some git commands like this:

git log --format=%H -- /root/test/a.txt | xargs -IcommitId git rev-parse commitId:/root/test/a.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top