Question

How can I create a sublist of class instances satisfying some condition on their attribute values, starting from the complete list of class instances?

For example, I have a list of instances of my class Person(). These persons have many attributes, among which ID, a unique identifier, and HH_ID, the identifier of the household they live in. I want to connect all the persons living in the same household, and therefore having the same HH_ID. By "connect", I mean creating an edge between all the household members, using networkx. In order to do this, I need to identify these persons and put them in a "sublist", in order to be processed by some algorithm to connect them all together. How can I achieve this?

I need also a general tool to do this for other more complex purposes (e.g. randomly connect N persons aged from 15 to 20 year), but in the easiest case of household members, I can use the fact that my list is ordered by ID and HH_ID, therefore I have something like:

ID HH_ID
0  0
1  0
2  0
3  1
4  1
5  2

where the first household is composed of persons [0,1,2], the second of persons [3,4] and so on...

For this household problem I have tried using the pairwise iterator recipe (itertools documentation in this way:

import pairwise
i = pairwise(personList)
for p in personList:
    toConnectList = [p]
    p1,p2 = i.next()
    while p1.hh_id == p2_hh_id
        toConnectList.append(p2)
        p1,p2 = i.next()
        # connect all persons in toConnectList

But obviously this doesn't work, as my iterator i goes down until hh_id of the two adjacent persons don't match, and restarts from there for the next person. E.g. for the persons in the above example, my iterator will start to compare persons 2 and 3 when it comes to person 1 in the for loop, while I would need some way to jump directly to person 3 in the for loop and have my iterator start comparing person 3 and 4. I hope this example clarifies a bit, even if it doesn't look very clear...

More generally, I need a way of creating a sublist of persons satisfying some conditions on their attribute values, possibly an efficient way (I have around 150000 persons)

Était-ce utile?

La solution

from itertools import groupby

def family_key(person):
    return person.HH_ID

persons.sort(key=family_key)
for hh_id, family in groupby(persons, key=family_key):
    for person in family:
        # do your thing

For more complex purposes, just alter the key function to return the same value for those items you want to group.

EDIT: Fixed error: groupby returns a tuple (key, group_iter), not just group_iter.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top