Question

I have python 2.7 and am trying to issue:

glob('{faint,bright*}/{science,calib}/chip?/')

I obtain no matches, however from the shell echo {faint,bright*}/{science,calib}/chip? gives:

faint/science/chip1 faint/science/chip2 faint/calib/chip1 faint/calib/chip2 bright1/science/chip1 bright1/science/chip2 bright1w/science/chip1 bright1w/science/chip2 bright2/science/chip1 bright2/science/chip2 bright2w/science/chip1 bright2w/science/chip2 bright1/calib/chip1 bright1/calib/chip2 bright1w/calib/chip1 bright1w/calib/chip2 bright2/calib/chip1 bright2/calib/chip2 bright2w/calib/chip1 bright2w/calib/chip2

What is wrong with my expression?

Was it helpful?

Solution 3

Since {} aren't supported by glob() in Python, what you probably want is something like

import os
import re

...

match_dir = re.compile('(faint|bright.*)/(science|calib)(/chip)?')
for dirpath, dirnames, filenames in os.walk("/your/top/dir")
    if match_dir.search(dirpath):
        do_whatever_with_files(dirpath, files)
        # OR
        do_whatever_with_subdirs(dirpath, dirnames)

OTHER TIPS

Combining globbing with brace expansion.

pip install braceexpand

Sample:

from glob import glob
from braceexpand import braceexpand

def braced_glob(path):
    l = []
    for x in braceexpand(path):
        l.extend(glob(x))
            
    return l
>>> braced_glob('/usr/bin/{x,z}*k')  
['/usr/bin/xclock', '/usr/bin/zipcloak']

{..} is known as brace expansion, and is a separate step applied before globbing takes place.

It's not part of globs, and not supported by the python glob function.

As that other guy pointed out, Python doesn't support brace expansion directly. But since brace expansion is done before the wildcards are evaluated, you could do that yourself, e.g.,

result = glob('{faint,bright*}/{science,calib}/chip?/')

becomes

result = [
    f 
    for b in ['faint', 'bright*'] 
    for s in ['science', 'calib'] 
    for f in glob('{b}/{s}/chip?/'.format(b=b, s=s))
]

As stated in other answers, brace-expansion is a pre-processing step for glob: you expand all the braces, then run glob on each of the results. (Brace-expansion turns one string into a list of strings.)

Orwellophile recommends the braceexpand library. This feels to me like too small of a problem to justify a dependency (though it's a common problem that ought to be in the standard library, ideally packaged in the glob module).

So here's a way to do it with a few lines of code.

import itertools
import re

def expand_braces(text, seen=None):
    if seen is None:
        seen = set()

    spans = [m.span() for m in re.finditer("\{[^\{\}]*\}", text)][::-1]
    alts = [text[start + 1 : stop - 1].split(",") for start, stop in spans]

    if len(spans) == 0:
        if text not in seen:
            yield text
        seen.add(text)

    else:
        for combo in itertools.product(*alts):
            replaced = list(text)
            for (start, stop), replacement in zip(spans, combo):
                replaced[start:stop] = replacement

            yield from expand_braces("".join(replaced), seen)

### testing

text_to_expand = "{{pine,}apples,oranges} are {tasty,disgusting} to m{}e }{"

for result in expand_braces(text_to_expand):
    print(result)

prints

pineapples are tasty to me }{
oranges are tasty to me }{
apples are tasty to me }{
pineapples are disgusting to me }{
oranges are disgusting to me }{
apples are disgusting to me }{

What's happening here is:

  1. Nested brackets can produce non-unique results, so we use seen to only yield results that haven't yet been seen.
  2. spans is the starting and stopping indexes of all innermost, balanced brackets in the text. The order is reversed by the [::-1] slice, such that indexes go from highest to lowest (will be relevant later).
  3. Each element of alts is the corresponding list of comma-delimited alternatives.
  4. If there aren't any matches (the text does not contain balanced brackets), yield the text itself, ensuring that it is unique with seen.
  5. Otherwise, use itertools.product to iterate over the Cartesian product of comma-delimited alternatives.
  6. Replace the curly-bracketed text with the current alternative. Since we're replacing data in-place, it has to be a mutable sequence (list, rather than str), and we have to replace the highest indexes first. If we replaced the lowest indexes first, the later indexes would have changed from what they were in the spans. This is why we reversed spans when it was first created.
  7. The text might have curly brackets within curly brackets. The regular expression only found balanced curly brackets that do not contain any other curly brackets, but nested curly brackets are legal. Therefore, we need to recurse until there are no nested curly brackets (the len(spans) == 0 case). Recursion with Python generators uses yield from to re-yield each result from the recursive call.

In the output, {{pine,}apples,oranges} is first expanded to {pineapples,oranges} and {apples,oranges}, and then each of these is expanded. The oranges result would appear twice if we didn't request unique results with seen.

Empty brackets like the ones in m{}e expand to nothing, so this is just me.

Unbalanced brackets, like }{, are left as-is.

This is not an algorithm to use if high performance for large datasets is required, but it's a general solution for reasonably sized data.

The wcmatch library has an interface similar to Python's standard glob, with options to enable brace expansion, tilde expansion, and more. Enabling brace expansion, for example:

from wcmatch import glob

glob.glob('{faint,bright*}/{science,calib}/chip?/', flags=glob.BRACE)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top