Question

I got some data and need to create a sorting mapping. The actual sorting is done by C code, which takes the integer list flt_neworder from my code. This is my current solution:

# Demo data
data = [
    "Option A",  # 0
    "Option B",  # 1
    "Blabla",    # 2
    "Some text"  # 3
]

class Item:
    def __init__(self, label):
        self.label = label

col = [Item(d) for d in data]

# Create sorting mapping
flt_neworder = [
   x[1] for x in sorted(
       zip(
           [x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
           range(len(col))
       )
   )
]

# Output: [1,2,0,3]
print(flt_neworder)
  • Desired output: [1,2,0,3], not [2,0,1,3]!

  • Position in flt_neworder = original index of item in col

  • Integer values = new positions

What are more efficient, or at least better readable solutions?

I successfully tested this one-liner:

tuple({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())

But it's still hard to read, and I believe I'm exploiting the fact that dicts are sorted in the CPython implementation...

Edit

Another solution I came up with:

flt_neworder = [None] * len(col)
for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j

And another, but rather slow one:

flt_neworder = list(map(get(0), sorted(enumerate(sorted(enumerate(item.label for item in col), key=get(1))), key=get(1))))


Thanks to Ryan P for providing alternative solutions and a script for testing the timing!

I tested the solutions on a large dataset (1k unique strings, full script) with amazing differences in timing compared to Ryan's small set:

orig: 2.116799074272876
origmod: 2.118176033553482
orignew: 1.1691872433702883
orig3: 1.4400411206224817
orig4: 2.0643228139915664
rewrite: 26.06907118537356
rewriteop: 25.91357442379376
rewriteuniq: 10.783081019086694

The winner is orignew(), and rewriteuniq() turns out to be fast for small datasets, but not great for large ones.

Was it helpful?

Solution

This is faster than your original code, and easier to read IMO:

data = [
    'Option A',
    'Option B',
    'Blabla',
    'Some text'
]
idata = list(enumerate(data))  # add indexes to uniquely identify items
sdata = sorted(idata, key=lambda x: x[1])  # sort the items by label
flt_neworder = [sdata.index(x) for x in idata]  # find the position to move to

Results with timeit:

orig: 12.3757910728
origmod: 7.85222291946
orignew: 6.15745902061
rewrite: 6.31552696228

(origmod is like your original code, but without the Item class as it doesn't seem like you use it; orignew is your one-liner)

Your one-liner is slightly faster, but I think harder to read.


Ok, I'll include my full test code this time. I moved the Item creation out of orig since you're only creating those to mimic the real world data. In addition to orig3 (your new code) and rewriteop (rewrite with operator.itemgetter), I added an extra test rewriteuniq on the off chance that your strings will be unique.

Results:

orig: 7.641715765
origmod: 7.38071417809
orignew: 5.82565498352
orig3: 5.67061495781
rewrite: 5.95284795761
rewriteop: 5.61896586418
rewriteuniq: 1.90719294548

Code:

import operator
from timeit import timeit

data = [
    'Option A',
    'Option B',
    'Blabla',
    'Some text',
]

desired_output = [1, 2, 0, 3]

class Item:
    def __init__(self, label):
        self.label = label

col = [Item(d) for d in data]


def orig():
    flt_neworder = [
        x[1] for x in sorted(
            zip(
                [x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
                range(len(col))
            )
        )
    ]

    assert flt_neworder == desired_output

def origmod():
    flt_neworder = [
        x[1] for x in sorted(
            zip(
                [x[0] for x in sorted(enumerate(data), key=lambda x: x[1])],
                range(len(data))
            )
        )
    ]

    assert flt_neworder == desired_output

def orignew():
    flt_neworder = list({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())
    assert flt_neworder == desired_output

def orig3():
    flt_neworder = [None] * len(col)
    for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j

    assert flt_neworder == desired_output

def rewrite():
    idata = list(enumerate(data))
    sdata = sorted(idata, key=lambda x: x[1])
    flt_neworder = [sdata.index(x) for x in idata]

    assert flt_neworder == desired_output

def rewriteop():
    idata = list(enumerate(data))
    sdata = sorted(idata, key=operator.itemgetter(1))
    flt_neworder = [sdata.index(x) for x in idata]

    assert flt_neworder == desired_output

def rewriteuniq():
    sdata = sorted(data)
    flt_neworder = [sdata.index(x) for x in data]

    assert flt_neworder == desired_output

for fn in (orig, origmod, orignew, orig3, rewrite, rewriteop, rewriteuniq):
    print fn.__name__ + ':', timeit(fn)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top