Question

I have a data frame that has a column that contains real values.

I would like to have an additional column that classify these values according to heir size. For example I would like to know if a value belongs to the group of the smallest values of a group of the largest values. I would like these two groups to have the same number of elements.

For example. If I have the following values:

[1,2,3,4,40,50]

I would like to map 1,2 and 3 to 1 and 4, 40, and 50 to 2. Is there an easy way to do it in a data frame.

In the above example I have used only two groups. But I would like to keep it flexible. For example for three groups I would like to map 1 and 2 to 1, 3 and 4 to 2, 40 and 50 to 3.

Was it helpful?

Solution

import heapq
import random
x = range(100000)
random.shuffle(x)
print(heapq.nlargest(2, x))

Gives: [99999, 99998]

Now just do something like:

max_column = heapq.nlargest(len(x)/2, x)

That should give you half of your list in a "large" pile, and do the same for the small pile.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top