Question

I'm looking for a way to implement a diversified sort. Each cell contains a weight value along with an enum type. I would like to sort it in a way that it will make the weight value dynamic according to the types of elements that were already chosen, giving priority to those 'less chosen' so far. I would like to control the diversity factor, so that when setting it with a high value, it'll produce a fully diverse results array, and when giving a low value it will provide an almost 'regular' sorted array.

This doesn't sound like a very specific use case, so if there are any references to known algorithms, that will also be great.

Update: According to Ophir suggestion, this might be a basic wrapper:

    // these will be the three arrays, one per type
    $contentTypeA, $contentTypeB, $contentTypeC;

    // sort each by value
    sort($contentTypeA);
    sort($contentTypeB);
    sort($contentTypeC);

    // while i didn't get the amount I want or there aren't any more options to chose from 
    while ($amountChosen < 100 && (count($contentTypeA) + count($contentTypeB) + count($contentTypeC) > 0)) {

        $diversifiedContent[] = selectBest($bestA, $bestB, $bestC, &$contentTypeA, &$contentTypeB, &$contentTypeC);

        $amountChosen++;
    }

    $diversifiedContent = array_slice($diversifiedContent, 0, 520);

    return $diversifiedContent;
}

function selectBest($bestA, $bestB, $bestC, &$contentTypeA, &$contentTypeB, &$contentTypeC) {
    static $typeSelected;
    $diversifyFactor = 0.5;

    if (?) {
        $typeSelected['A']++;
        array_shift($contentTypeA);
        return $bestA;
    }
    else if (?) {
        $typeSelected['B']++;
        array_shift($contentTypeB);
        return $bestA;
    }
    else if (?) {
        $typeSelected['C']++;
        array_shift($contentTypeC);
        return $bestA;
    }
}
Was it helpful?

Solution

Your definition is very general terms, not in mathematical terms, so I doubt if you can find a close solution that matches exactly what you want. I can suggest this simple approach:

Sort each type separately. Then merge the lists by iteratively taking the maximum value in the list of highest priority, where priority is the product of the value and a "starvation" factor for that type. The starvation factor will be a combination of how many steps ignored that type, and the diversity factor. The exact shape of this function depends on your application.

OTHER TIPS

Heres an idea:

class item(object):
    def __init__(self, enum_type, weight):
        self.enum_type = enum_type
        self.weight = weight
        self.dyn_weight = weight

    def __repr__(self):
        return unicode((self.enum_type, self.weight, self.dyn_weight))


def sort_diverse(lst, factor):
    # first sort
    by_type = sorted(lst, key=lambda obj: (obj.enum_type, obj.weight))
    cnt = 1
    for i in xrange(1, len(lst)):
        current = by_type[i]
        previous = by_type[i-1]
        if current.enum_type == previous.enum_type:
            current.dyn_weight += factor * cnt
            cnt += 1
        else:
            cnt = 1
    return sorted(by_type, key=lambda obj: (obj.dyn_weight, obj.enum_type)) 

Try this example:

lst = [item('a', 0) for x in xrange(10)] + [item('b', 1) for x in xrange(10)] + [item('c', 2) for x in xrange(10)]
print sort_diverse(lst, 0) # regular sort
print sort_diverse(lst, 1) # partially diversified
print sort_diverse(lst, 100) # completely diversified

Depending on your needs, you might want to use a more sophisticated weight update function.

This algorithm is basically O(nlogn) time complexity and O(n) space complexity as it requires two sorts and two copies of the list.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top