Given this:

['2014\\2014-01 Jan\\2014-01-01',
 '2014\\2014-01 Jan\\2014-01-02',
 '2014\\2014-01 Jan\\2014-01-03',
 '2014\\2014-01 Jan\\2014-01-04',
 '2014\\2014-01 Jan\\2014-01-05',
 '2014\\2014-01 Jan\\2014-01-06',
 '2014\\2014-01 Jan\\2014-01-07',
 '2014\\2014-01 Jan\\2014-01-08',
 '2014\\2014-01 Jan\\2014-01-09',
 '2014\\2014-01 Jan\\2014-01-10',
 '2014\\2014-01 Jan\\2014-01-11',
 '2014\\2014-01 Jan\\2014-01-12',
 '2014\\2014-01 Jan\\2014-01-13',
 '2014\\2014-01 Jan\\2014-01-14',
 '2014\\2014-01 Jan\\2014-01-15',
 '2014\\2014-01 Jan\\2014-01-16',
 '2014\\2014-01 Jan\\2014-01-17',
 '2014\\2014-01 Jan\\2014-01-18',
 '2014\\2014-01 Jan\\2014-01-19',
 '2014\\2014-01 Jan\\2014-01-20',
 '2014\\2014-01 Jan\\2014-01-21',
 '2014\\2014-01 Jan\\2014-01-22',
 '2014\\2014-01 Jan\\2014-01-23',
 '2014\\2014-01 Jan\\2014-01-24',
 '2014\\2014-01 Jan\\2014-01-25',
 '2014\\2014-01 Jan\\2014-01-26',
 '2014\\2014-01 Jan\\2014-01-27',
 '2014\\2014-01 Jan\\2014-01-28',
 '2014\\2014-01 Jan\\2014-01-29',
 '2014\\2014-01 Jan\\2014-01-30',
 '2014\\2014-01 Jan\\2014-01-31',
 '2014\\2014-02 Feb\\2014-02-01',
 '2014\\2014-02 Feb\\2014-02-02',
 '2014\\2014-02 Feb\\2014-02-03',
 '2014\\2014-02 Feb\\2014-02-04',
 '2014\\2014-02 Feb\\2014-02-05',
 '2014\\2014-02 Feb\\2014-02-06',
 '2014\\2014-02 Feb\\2014-02-07',
 '2014\\2014-02 Feb\\2014-02-08',
 '2014\\2014-02 Feb\\2014-02-09',
 '2014\\2014-02 Feb\\2014-02-10',
 '2014\\2014-02 Feb\\2014-02-11',
 '2014\\2014-02 Feb\\2014-02-12',
 '2014\\2014-02 Feb\\2014-02-13',
 '2014\\2014-02 Feb\\2014-02-14',
 '2014\\2014-02 Feb\\2014-02-15',
 '2014\\2014-02 Feb\\2014-02-16',
 '2014\\2014-02 Feb\\2014-02-17',
 '2014\\2014-02 Feb\\2014-02-18',
 '2014\\2014-02 Feb\\2014-02-19']

How do you get something like this? (Solution 1: delimiter based, with user definable delimiter)

['2014\\2014-01 Jan\\2014-01-01',
 '                 \\2014-01-02',
 '                 \\2014-01-03',
 '                 \\2014-01-04',
 '                 \\2014-01-05',
 '                 \\2014-01-06',
 '                 \\2014-01-07',
 '                 \\2014-01-08',
 '                 \\2014-01-09',
 '                 \\2014-01-10',
 '                 \\2014-01-11',
 '                 \\2014-01-12',
 '                 \\2014-01-13',
 '                 \\2014-01-14',
 '                 \\2014-01-15',
 '                 \\2014-01-16',
 '                 \\2014-01-17',
 '                 \\2014-01-18',
 '                 \\2014-01-19',
 '                 \\2014-01-20',
 '                 \\2014-01-21',
 '                 \\2014-01-22',
 '                 \\2014-01-23',
 '                 \\2014-01-24',
 '                 \\2014-01-25',
 '                 \\2014-01-26',
 '                 \\2014-01-27',
 '                 \\2014-01-28',
 '                 \\2014-01-29',
 '                 \\2014-01-30',
 '                 \\2014-01-31',
 '    \\2014-02 Feb\\2014-02-01',
 '                 \\2014-02-02',
 '                 \\2014-02-03',
 '                 \\2014-02-04',
 '                 \\2014-02-05',
 '                 \\2014-02-06',
 '                 \\2014-02-07',
 '                 \\2014-02-08',
 '                 \\2014-02-09',
 '                 \\2014-02-10',
 '                 \\2014-02-11',
 '                 \\2014-02-12',
 '                 \\2014-02-13',
 '                 \\2014-02-14',
 '                 \\2014-02-15',
 '                 \\2014-02-16',
 '                 \\2014-02-17',
 '                 \\2014-02-18',
 '                 \\2014-02-19']

I encounter this situation quite often, basically I have a list of strings that I want to make it easier to process visually by removing redundant matching elements at the beginning of the string. Now I know this is what a TREE output is for normal folder traversal, but these are not real folders, but just strings in a list.

Ideally the function would accept a heirarchy delimiter or just do on a character basis (seperator=None).

def printheirarchy(data,seperator=","):

The output for a character level hierarchy would be like following: (Solution 2: character by character)

['2014\\2014-01 Jan\\2014-01-01',
 '                            2',
 '                            3',
 '                            4',
 '                            5',
 '                            6',
 '                            7',
 '                            8',
 '                            9',
 '                           10',
 '                            1',
 '                            2',
 '                            3',
 '                            4',
 '                            5',
 '                            6',
 '                            7',
 '                            8',
 '                            9',
 '                           20',
 '                            1',
 '                            2',
 '                            3',
 '                            4',
 '                            5',
 '                            6',
 '                            7',
 '                            8',
 '                            9',
 '                           30',
 '                            1',
 '            2 Feb\\2014-02-01',
 '                            2',
 '                            3',
 '                            4',
 '                            5',
 '                            6',
 '                            7',
 '                            8',
 '                            9',
 '                           10',
 '                            1',
 '                            2',
 '                            3',
 '                            4',
 '                            5',
 '                            6',
 '                            7',
 '                            8',
 '                            9']

This seems less useful in this example but is very evident when analyzing urls, logs ...etc. Ideally you would just grey out the similar parts, rather than remove them, but I don't even know how to begin with that. (or conversely, bold the differences). Basically you are comparing each element with the previous element and highlighting differences & suppressing similarities.

I've searched and found many options that are close to this, but not exactly this. os.path.commonprefix is an example. Maybe difflib?

The value is in reducing visual clutter when examining lists of items.

有帮助吗?

解决方案 2

Nice question. How about this small solution:

def commonPrefix(a, b):
  i = 0
  while i < len(a) and i < len(b) and a[i] == b[i]:
    i += 1
  return i

def eachWithPrefix(v):
  p = ''
  for x in v:
    yield commonPrefix(p, x), x
    p = x

Now you can choose what you want:

list(eachWithPrefix(v))

will return a list of your values and each will state how many characters are equal to the former line, so

print '\n'.join(' '*p + x[p:] for p, x in eachWithPrefix(v))

Will print the second solution you proposed.

print '\n'.join('\t' * p + '\\'.join(x[p:]) for p, x in eachWithPrefix(x.split('\\') for x in v))

on the other hand will perform the same action for the delimiter \ and replace the to-be-omitted parts with tab stops. This is not quite the format you proposed in your first output example but I guess you get the point.

Try:

print '\n'.join('\\'.join([ s if i >= p else ' '*len(s) for i, s in enumerate(x) ]) for p, x in eachWithPrefix(x.split('\\') for x in v))

This will replace the equal parts with like-sized just-space strings. The output will still contain the delimiters, though, but maybe that's even nicer:

2014\2014-01 Jan\2014-01-01
    \           \2014-01-02
    \           \2014-01-03
    \           \2014-01-04
    \           \2014-01-05
...
    \           \2014-01-31
    \2014-02 Feb\2014-02-01
    \           \2014-02-02
    \           \2014-02-03
...

To remove also those you can use this approach:

print '\n'.join(' ' * len('\\'.join(x[:p])) + '\\'.join(x)[len('\\'.join(x[:p])):] for p, x in eachWithPrefix(x.split('\\') for x in v))

But this now contains some code doubling, so maybe an iterative loop would be nicer here:

for p, x in eachWithPrefix(x.split('\\') for x in v):
  s = '\\'.join(x)
  c = '\\'.join(x[:p])
  print ' '*len(c) + s[len(c):]

Or as an easy-to-use generator:

def heirarchy(data, separator=","):
  for p, x in eachWithPrefix(x.split(separator) if separator else list(x) for x in data):
    s = separator.join(x)
    c = separator.join(x[:p])
    yield ' '*len(c) + s[len(c):]

So now heirarchy(data, separator='\\') creates exactly your expected output.

其他提示

Seems like you want to reinvent a http://en.wikipedia.org/wiki/Radix_tree

Anyhow, here's a simple generator:

def grouped(iterable):
    prefix = None
    for i in iterable:
        pre, suf = i[:16], i[16:]
        if pre != prefix:
            prefix = pre
            yield pre + suf
        else:
            yield " " * 16 + suf
from difflib import SequenceMatcher

def remove_redundant_prefixes(it):
    """
    remove_redundant_prefixes(it) -> iterable (generator)

        Iterate through a list of strings, removing successive common prefixes.
    """
    prev_line = ''
    for line in sorted(it):
        sm = SequenceMatcher(a=prev_line, b=line)
        prev_line = line

        # Returns 3 element tuple, last element is the size of the match.
        match_size = sm.get_matching_blocks()[0][2]

        # No match == no prefix, don't prune the string.
        if match_size == 0:
            yield line
        else:
            # Prune per the match
            yield line.replace(line[:match_size], ' ' * match_size, 1)

Ok inspired by the commonprefix answers from this question I played it with it for a bit and inspiration came when I realized I could send a list with just two elements each time!

Here's my code, this handles only the character by character case, and I'm not sure how good this is (i suspect not very much! as a lot of unnecessary copying occurs). But I was able to successfully reproduce the 3rd output from my question. This still leaves the other part unresolved.

def printheirarchy(data,seperator=","):
    if len(data) < 2:
        pprint(data)
        return
    newdata = []
    newdata.append(data[0])
    for i in range(1,len(data)):
        prefix = os.path.commonprefix(data[i-1:i+1])
        newdata.append(data[i].replace(prefix," "*len(prefix),1))
    pprint(newdata)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top