Question

I have a set of files saved in my laptop. The folder structure is like:

Part1(folder)
 Part1(subfolder)
  awards_1990 (subfolder)
     awards_1990_00 (subfolder)
        (files)
     awards_1990_01
        (files)
        ...
        ...
        ...
  awards_1991
    awards_1991_01
      (files)
    awards_1991_01
    awards_1991_01
     ...
     ...
     ...
  awards_1992
     ...
     ...
     ...
  awards_1993
     ...
     ...
     ...
  awards_1994
     ...
     ...
     ...

So I am trying to extract the list of file path with os.walk. The code I have is like this:

import os
matches=[]
for root, dirnames, dirname in os.walk('E:\\Grad\\LIS\\LIS590 Text mining\\Part1\\Part1'):
    for dirname in dirnames:
        for filename in dirname:
                if filename.endswith(('.txt','.html','.pdf')):
            matches.append(os.path.join(root,filename))

When I call matches, it returns [].

I tried another code:

import os
dirnames=os.listdir('E:\\Grad\\LIS\\LIS590 Text mining\\Part1\\Part1')
for filenames in dirnames:
    for filename in filenames:
        path=os.path.join(filename)
        print (os.path.abspath(path))

This one gives me me this result:

C:\Python32\a
C:\Python32\w
C:\Python32\a
C:\Python32\r
C:\Python32\d
C:\Python32\s
C:\Python32\_
C:\Python32\1
...

Researching on this error. Any idea what to do with this?

Was it helpful?

Solution 2

for filename in dirname: enumerates individual characters in dirname string. Try:

#!/usr/bin/env python
import os

topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1'
matches = []
for root, dirnames, filenames in os.walk(topdir):
    for filename in filenames:
        if filename.endswith(('.txt','.html','.pdf')):
            matches.append(os.path.join(root, filename))
print("\n".join(matches))

You don't need the for-loop with dirnames here.

OTHER TIPS

Function endswith takes: suffix[, start[, end]], so if you have more than one suffix, then you need parentheses around them:

if filename.endswith(('.txt','.html','.pdf')):
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top