获取目录中经过过滤的文件列表

https://stackoverflow.com/questions/2225564

19-09-2019
|

题

我正在尝试使用 Python 获取目录中的文件列表，但我不需要所有文件的列表。

我本质上想要的是能够执行以下操作，但使用 Python 而不是执行 ls。

ls 145592*.jpg

如果没有内置方法，我目前正在考虑编写一个 for 循环来迭代 an 的结果 os.listdir() 并将所有匹配的文件附加到新列表中。

但是，该目录中有很多文件，因此我希望有一种更有效的方法（或内置方法）。

解决方案

glob.glob('145592*.jpg')

其他提示

glob.glob()绝对是做到这一点（按照伊格纳西奥）的方式。但是，如果你需要更复杂的匹配，你可以用一个列表理解和re.match()，像这样做：

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

更灵活，但你注意，效率更低。

保持简单：

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

我喜欢这种形式的列表内涵的，因为它的英文读起来很好。

我读第四行为：对于每一个在os.listdir FN我的路，让我只匹配我的，包括扩大中的任何一个的人。

这可能是很难新手Python程序员真的习惯使用列表内涵进行过滤，并可以有非常大的数据集的一些内存开销，但是对于列出目录和其他简单的字符串过滤任务，列表内涵引领更干净佐证的代码。

这个设计唯一的一点是，它并不能保护你免受使传递一个字符串，而不是一个列表的错误。例如，如果你不小心将字符串转换为一个列表，并最终检查对字符串的所有字符，你可能最终得到误报的转换。

不过，最好是有一个问题，很容易比一个解决方案，很难理解来解决。

另一种选择：

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html

使用os.walk递归列出文件

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

初步代码

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

解决方案1 - 使用“全局”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

解决方案2 - 使用“os”+“fnmatch”

变体2.1 - 在当前目录中查找

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

变体2.2 - 递归查找

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

结果

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

解决方案3 - 使用“路径库”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

笔记：

在Python 3.4上测试
仅在 Python 3.4 中添加了模块“pathlib”
Python 3.5 添加了使用 glob.glob 进行递归查找的功能https://docs.python.org/3.5/library/glob.html#glob.glob. 。由于我的机器安装了Python 3.4，所以我没有测试过。

与`glob`模块过滤器：

导入水珠

import glob

通配符：

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

Fiter延伸`.txt`：

files = glob.glob("/home/ach/*/*.txt")

单个字符

glob.glob("/home/ach/file?.txt")

编号范围

glob.glob("/home/ach/*[0-9]*")

字母范围

glob.glob("/home/ach/[a-c]*")

你可能也想了更高层次的方法（我已经实现并封装为 findtools ）：

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

可与被安装

pip install findtools

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

这会给你的JPG文件列表以及它们的完整路径。你可以用x[0]+"/"+f取代f只是文件名。您也可以与任何字符串条件你想更换f.endswith(".jpg")。

与文件名 “JPG” 和 “路径/到/图像”， “PNG” 扩展：

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

可以使用 pathlib 可用在Python标准库3.4和上方。

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

可以使用subprocess.check_ouput（）作为

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True)

当然，引号之间的字符串，可以是您想要的shell执行，并存储输出任何东西。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow