Вопрос

I have a directory with a few thousand files. I'd like to be able to make sub directories inside the larger director and move exactly 150 files (filtered by extension) into each of the newly created directories. I would love some guidance to get me going. Here is the trifle I'm starting with:

wd = os.chdir(r'foobar')

allgzs=[]

for file in os.listdir(wd):
    if file.endswith('asc.gz'):
       allgzs.append(file)

for gz in allgzs:


# For every 150 .gz files: 
# Create directory -- os.mkdir?
# Move files into directory  -- shutil.move?
Это было полезно?

Решение

I came up with a grouper function to yield a long sequence in runs of n:

import itertools
def grouper(S, n):
    iterator = iter(S)
    while True:
        items = list(itertools.islice(iterator, n))
        if len(items) == 0:
            break
        yield items

From there, you can do what you want pretty concisely:

import glob, os, shutil
fnames = sorted(glob.glob('*asc.gz'))
for i, fnames in enumerate(grouper(fnames, 150)):
    dirname = 'batch%d' % i
    os.mkdir(dirname)
    for fname in fnames:
        shutil.move(fname, dirname)

Другие советы

Let's see, I'll enumerate what I would do in this situation.

  • Get list of files - X
  • Loop through all files - X
  • Make sure file has only one period - O
  • Filter out unwanted extensions - X
  • Add wanted extensions to new list - O
  • Loop through all files in new list - O
  • Add them to a list and use a counter so you know when you have 150 - O
  • When you have 150 files, move them all - O
  • Back to looping through - O
  • Once you loop through all and copy, you're done.

There you have it. Also, this is in no way the most efficient or best way to do this, it's just how I would.

EDIT: sample code:

wantedexts = [".jpg",".png",".randomext"]
wantedfiles = []

for f in files: #the files from the folder
    allowedext = 0
    for exts in wantedexts:
        if f.endswith(exts):
            allowedext = 1
            break
    if allowedext:
        wantedfiles.append(f)

counter = 0
countertwo = 0 #used to tell when you get to the end of the files
copyfiles = []
for f in wantedfiles:
    if counter == 150 or countertwo == len(wantedfiles):
        for fc in copyfiles:
            copy    #too lazy to type in real copy code
        counter = 0
        copyfiles = []
    copyfiles.append(f)
    counter += 1
    countertwo += 1

Pretty much wrote the code for you, but whatever. Countertwo is used to copy the last items (as there will likely be some left over).

If you're keen on keeping the code you already wrote, here is an intuitive way to finish it up:

import os
import shutil

wd = os.chdir(r'foobar')

allgzs=[]

for file in os.listdir(wd):
    if file.endswith('asc.gz'):
       allgzs.append(file)

n = 1
name = "subdir%i" %n
for gz in allgzs:
    if not os.path.exists(name):
        os.mkdir(name)
    shutil.move(gz, name)
    if len(os.listdir(name)) == 150:
        n += 1
        name = "subdir%i" %n

I'm just using a list comprehension to make a list of filenames here. Your list will come from os.listdir or glob.glob('*asc.gz')

>>> files = ['foo{}.gz'.format(i) for i in range(5000)]
>>> for i, fname in enumerate(files):
...     if i%150 == 0:
...         dirname = "bar{}".format(i//150)
...         os.mkdir(dirname)
...     print fname,"-->", dirname
...     shutil.move(fname, dirname)
... 
foo0.gz --> bar0
foo1.gz --> bar0
foo2.gz --> bar0
foo3.gz --> bar0
foo4.gz --> bar0
foo5.gz --> bar0
... 

The trick here is to keep count of the files we are processing and divide by 150 to map it onto the directory. // just means integer division (same as / in Python2)

You can use glob Module for this

for file in glob.glob('*.gz'):
    #Do Stuff
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top