Question

I got some data and need to create a sorting mapping. The actual sorting is done by C code, which takes the integer list flt_neworder from my code. This is my current solution:

# Demo data
data = [
    "Option A",  # 0
    "Option B",  # 1
    "Blabla",    # 2
    "Some text"  # 3
]

class Item:
    def __init__(self, label):
        self.label = label

col = [Item(d) for d in data]

# Create sorting mapping
flt_neworder = [
   x[1] for x in sorted(
       zip(
           [x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
           range(len(col))
       )
   )
]

# Output: [1,2,0,3]
print(flt_neworder)
  • Desired output: [1,2,0,3], not [2,0,1,3]!

  • Position in flt_neworder = original index of item in col

  • Integer values = new positions

What are more efficient, or at least better readable solutions?

I successfully tested this one-liner:

tuple({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())

But it's still hard to read, and I believe I'm exploiting the fact that dicts are sorted in the CPython implementation...

Edit

Another solution I came up with:

flt_neworder = [None] * len(col)
for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j

And another, but rather slow one:

flt_neworder = list(map(get(0), sorted(enumerate(sorted(enumerate(item.label for item in col), key=get(1))), key=get(1))))


Thanks to Ryan P for providing alternative solutions and a script for testing the timing!

I tested the solutions on a large dataset (1k unique strings, full script) with amazing differences in timing compared to Ryan's small set:

orig: 2.116799074272876
origmod: 2.118176033553482
orignew: 1.1691872433702883
orig3: 1.4400411206224817
orig4: 2.0643228139915664
rewrite: 26.06907118537356
rewriteop: 25.91357442379376
rewriteuniq: 10.783081019086694

The winner is orignew(), and rewriteuniq() turns out to be fast for small datasets, but not great for large ones.

Était-ce utile?

La solution

This is faster than your original code, and easier to read IMO:

data = [
    'Option A',
    'Option B',
    'Blabla',
    'Some text'
]
idata = list(enumerate(data))  # add indexes to uniquely identify items
sdata = sorted(idata, key=lambda x: x[1])  # sort the items by label
flt_neworder = [sdata.index(x) for x in idata]  # find the position to move to

Results with timeit:

orig: 12.3757910728
origmod: 7.85222291946
orignew: 6.15745902061
rewrite: 6.31552696228

(origmod is like your original code, but without the Item class as it doesn't seem like you use it; orignew is your one-liner)

Your one-liner is slightly faster, but I think harder to read.


Ok, I'll include my full test code this time. I moved the Item creation out of orig since you're only creating those to mimic the real world data. In addition to orig3 (your new code) and rewriteop (rewrite with operator.itemgetter), I added an extra test rewriteuniq on the off chance that your strings will be unique.

Results:

orig: 7.641715765
origmod: 7.38071417809
orignew: 5.82565498352
orig3: 5.67061495781
rewrite: 5.95284795761
rewriteop: 5.61896586418
rewriteuniq: 1.90719294548

Code:

import operator
from timeit import timeit

data = [
    'Option A',
    'Option B',
    'Blabla',
    'Some text',
]

desired_output = [1, 2, 0, 3]

class Item:
    def __init__(self, label):
        self.label = label

col = [Item(d) for d in data]


def orig():
    flt_neworder = [
        x[1] for x in sorted(
            zip(
                [x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
                range(len(col))
            )
        )
    ]

    assert flt_neworder == desired_output

def origmod():
    flt_neworder = [
        x[1] for x in sorted(
            zip(
                [x[0] for x in sorted(enumerate(data), key=lambda x: x[1])],
                range(len(data))
            )
        )
    ]

    assert flt_neworder == desired_output

def orignew():
    flt_neworder = list({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())
    assert flt_neworder == desired_output

def orig3():
    flt_neworder = [None] * len(col)
    for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j

    assert flt_neworder == desired_output

def rewrite():
    idata = list(enumerate(data))
    sdata = sorted(idata, key=lambda x: x[1])
    flt_neworder = [sdata.index(x) for x in idata]

    assert flt_neworder == desired_output

def rewriteop():
    idata = list(enumerate(data))
    sdata = sorted(idata, key=operator.itemgetter(1))
    flt_neworder = [sdata.index(x) for x in idata]

    assert flt_neworder == desired_output

def rewriteuniq():
    sdata = sorted(data)
    flt_neworder = [sdata.index(x) for x in data]

    assert flt_neworder == desired_output

for fn in (orig, origmod, orignew, orig3, rewrite, rewriteop, rewriteuniq):
    print fn.__name__ + ':', timeit(fn)
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top