Question

I have little experience with python and I'm having a lot of trouble thinking of how I would group items that fall on the same line into a list then move to the next line and ask "Do either of these two items exist in a current list?". If one of them does, append the other item on the line into that same list. If both exist in a list already, do nothing. If neither exist in a list then create a new list and add both of these items. In the end I need an output of all the items on their own line with the name of the list they were put in a column next to them. It does not necessarily have to be lists, it just has to be in named groups.

Example input:

A    F
G    H
F    J
Y    G
H    G

Example Output:

A    List1
F    List1
J    List1
H    List2
G    List2
Y    List2

I don't know whether to try to put them in dictionary keys because how would the keys be named and printed with each value later. There will be several thousand groups or lists of items so the python script needs to automatically name each new group, which I'm not sure how to do.

Here is some code that I'm trying to write to do this but its missing a lot because I'm getting caught up on exactly which method to use to do this.

for line in data:
    values = line.split()
    name1 = values[0]
    name2 = values[1]       

    if name1 in group:
        if name2 not in group:
            append name2

    elif name2 in group:
        if name1 not in group:
            append name1

    elif name1 and name2 not in group:
        append name1 and name2 in new group

    else:
        continue
Was it helpful?

Solution

This seems like it would be easy to do with a single dictionary. In this case, I'm not actually making any lists, just identifying which list each item would go in. If you actually do need real list objects, you can add them later, or perhaps store them instead of the a number.

data_str = '''A    F
              G    H
              F    J
              Y    G
              H    G'''
data = [line.strip().split() for line in data_str.splitlines()]

list_ids = {}
next_id = 0
for a, b in data:
    if a in list_ids and b in list_ids:
        pass # do nothing if they both are already in a list
    elif a in list_ids:
        list_ids[b] = list_ids[a]
    elif b in list_ids:
        list_ids[a] = list_ids[b]
    else:
        list_ids[a] = list_ids[b] = next_id
        next_id += 1

for name, id in list_ids.items():
    print(name, id)

Note that if you need to handle items that are already in lists by joining the lists (rather than just ignoring pairs that are both already in lists), you're dealing with disjoint sets, which have some pretty nifty algorithms.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top