Question

So I am trying to write a function for finding all alternate concatenations of 2 strings. Say x = '12' and y = 'ab', then the expected result is ['ab12', 'a1b2', 'a12b', '1ab2', '1a2b', '12ab']. I have written the following program:

# [ list of sequences where each seq is a concat of s and t ] 
def g( s, t ) :
    #
    if ( s == "" ) :
        return [ t ]
    elif ( t == "" ) :
        return [ s ]
    else :
        res = []
        for i in xrange( len(s) ) :
            for j in xrange( 1, len(t) + 1 ) :
                res.extend( [ s[:i] + t[:j] + x for x in g( s[i:], t[j:] ) ] )
                #
        res.append( s + t )
        return res

It outputs the correct result, but some sequences have duplicates:

In [22]: r = g( "12", "ab" )
         [ (x, r.count(x)) for x in set( r ) ]
Out[22]: [('ab12', 2), ('12ab', 1), ('1ab2', 2), ('a12b', 1), ('1a2b', 1), ('a1b2', 1)]

How can I avoid the duplicates? (I don't want to check whether an element has already been added; I am interested in a "genuine" way to generate unique sequences)

Was it helpful?

Solution

This is best done using a recursive approach. Strip the first character off s, and find all combinations of the remaining strings, and do the same for t:

def g1( s, t ):
        return [s[0]+x for x in g( s[1:], t )]

def g(s,t):
        if( s=="" ):
                return [t]
        elif t=="":
                return [s]
        else:
                return g1( s, t ) + g1( t, s )

OTHER TIPS

Where are the duplicates coming from?

The code is building substrings of t twice: By looping over values of j and by always choosing the empty prefix of s on each recursion.

For each iteration of j you will append t[:j] and then append a recursive call that will start with i = 0, thereby appending more characters of t. So the same substring of characters from t will be created both by the j loop and by recursion. For example, "12ab" can be built by starting with "1" in the first level of recursion, or by starting with "12" in the first layer of recursion.

(The pattern of duplicates is more obvious with longer strings, say "abc" and "123".)

Fixing the original solution

Let's fix the original solution. We want to build each annexed substrings of t either via the j loop or via recursion. For fun, I'll show a solution for each.

First, let's keep the j loop. This means that we need force each recursive call to g to not start with a character of t. But we also need the special-case first call to g to generate strings that start with characters of t. Here's a semi-hacky way to do it:

def g( s, t, z=0 ) :
    if ( s == "" ) :
        return [ t ]
    elif ( t == "" ) :
        return [ s ]
    else :
        res = []
        for i in xrange( z, len(s) ) :
            for j in xrange( 1, len(t) + 1 ) :
                res.extend( [ s[:i] + t[:j] + x for x in g( s[i:], t[j:], 1) ])
        res.append( s + t )
        return res

Each call to g will iterate over all strings with all prefixes of s followed by all prefixes of t, followed by recursion (which must start with s). The first call to g is a special case, so we explicitly allow the first call to allow t as the first prefix.

Or we can let recursion build the substrings of t.

def g( s, t, ) :
    if ( s == "" ) :
        return [ t ]
    elif ( t == "" ) :
        return [ s ]
    else :
        res = []
        for i in xrange( len(s) ) :
            res.extend( [ s[:i] + t[:1] + x for x in g( s[i:], t[1:] ) ] )
        res.append( s + t )
        return res

In this version, we choose all prefixes of s, add a character from t, then recurse. Since we count "" as a prefix of s, substrings of t greater than 1 will be built.

As an aside, when the recursion works, you can use

for i in xrange( len(s) + 1) :

to eliminate the line

res.append( s + t )

You're selecting len(b) indexes from 0...len(a)+len(b)-1 for the characters of b to appear in (and the other indexes get characters from a). This suggests using itertools.combinations, which gives a solution like this:

import itertools

def concats(a, b):
    for i in map(set, itertools.combinations(xrange(len(a) + len(b)), len(b))):
        its = iter(a), iter(b)
        yield ''.join(next(its[x in i]) for x in xrange(len(a) + len(b)))

print list(concats('abc', '12'))

Incidentally I recently answered this same question (though not for Strings) on Mathematica.SE.
My solution used the algorithm that pentadecagon described:

f[u : {a_, x___}, v : {b_, y___}, c___] := f[{x}, v, c, a] ~Join~ f[u, {y}, c, b]

f[{x___}, {y___}, c___] := {{c, x, y}}

Use:

f[{1, 2}, {a, b}]
{{1, 2, a, b}, {1, a, 2, b}, {1, a, b, 2},
 {a, 1, 2, b}, {a, 1, b, 2}, {a, b, 1, 2}}

This code uses almost exclusively the pattern matching syntax of Mathematica, the only exception being Join.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top