Is this an efficient way of listing all .mp3 files inside in a directory (including eventual subdirectories) in Python?

StackOverflow https://stackoverflow.com/questions/22255344

  •  11-06-2023
  •  | 
  •  

Pergunta

Is this a good approach? Is there a more efficient way to do it (without having to trade code readability for efficiency) ?

for root, dirs, files in os.walk(path, topdown=False):
    for name in files:
        if re.match(r'.*\.mp3', name):
            yield os.path.join(root, name) # returns the path of the .mp3 file

EDIT: Conclusion:

If you ignore recursion, the fastest way to do it is by using the glob module. If you want recursion, switching from re.match() to using slices makes it few milliseconds faster.

Foi útil?

Solução

A Python-based recursive directory walker should definitely include os.walk, that is the right choice. However, I would check for the extension using os.path.splitext() instead of using regex. return is not what you want here I guess, it terminates the iteration when hitting the first mp3 file. Replace it with yield. This creates a generator function. Call it from the outside, and you can easily iterate through all mp3 files in your directory tree.

A working solution, test.py:

import os

def mp3gen():
    for root, dirs, files in os.walk('.'):
        for filename in files:
            if os.path.splitext(filename)[1] == ".mp3":
                yield os.path.join(root, filename)

for mp3file in mp3gen():
    print mp3file

Test:

$ mkdir testenv
$ cd testenv
$ mkdir subdir
$ touch test.mp3
$ touch subdir/test2.mp3
$ touch foo.mp4
$ python test.py
./test.mp3
./subdir/test2.mp3

By the way, whatever you do, it is unlikely that the performance of this iteration is the bottleneck in your workflow. If it is, I would actually use the find utility using find . -name "*.mp3", and then pipe its output to your Python script, then read the items from stdin using for line in sys.stdin.

Outras dicas

Note: You can only use this method if Python version is >= 3.5

You can use glob module for this:

import glob
mp3_files = glob.iglob('**/*.mp3', recursive=True)

for mp3 in mp3_list:
    print(mp3)

You can use glob.glob('**/*.mp3', recursive=True) if you want a list instead of a generator.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top