Question

I need to find all git repositories in some folders. Previously, I used find . -type -d -name .git. Now I rewrote this in Python 3 and use os.walk. It seems to take way longer for it to traverse the tree than the simple find.

How can I speed this up?

Here is the complete code: old new

Was it helpful?

Solution

There are many threads about os.walk() on SO, it's dead slow.

  • You may be interested in scandir (don't know if it's still applicable to python3)
  • Instead of walking the entire tree and filtering for ".git", be a good chap and use the glob-module to find them directly.

OTHER TIPS

If you mean the implementation of find_repos(), then using os.walk() is probably not optimal. The reason is that once you find .git subdirectory, there is no need to search deeper. Try to write your own directory traversal. You can have a look at the sources in os.py on how the walk is implemented -- the reason for slowness could be it is written in Python.

For other parts of your new solution... I have noticed that you do not compile regular expressions, but I did not checked details.

I suggest to separate the functionality to another Python script, then measure the solution using timeit module, and then try to optimize.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top