Question

I have a list called "GO_file":

GO_file = ["A_1 12", "A_2 13", "A_3 14", "A_4 12", "B_1 1", "B_2 1", "B_3 5"]

I want to convert it to:

A: 12, 13, 14

B: 1, 5

from collections import defaultdict
GO_file = ["A_1 12", "A_2 13", "A_3 14", "A_4 12", "B_1 1", "B_2 1" "B_3 5"]

GO_dict = defaultdict(list)
for GO_names in GO_file:
   gene_id = GO_names.split("_")[0]
   GO_id = GO_names.split(" ")[1:]
   GO_dict[gene_id] = GO_id
print GO_dict    

However, this code only append key and only one value:

defaultdict(<type 'list'>, {'A': ['12'], 'B': ['5']})

I appreciate any suggestions.

Was it helpful?

Solution

Your code has few issues

  1. There are duplicates in your GO_ID, and you seem to only care about unique. So you need a defaultdict(set) instead of defaultdict(list)
  2. Your split algorithm to generate the key and value is buggy
  3. GO_dict[gene_id] = GO_id, simply assigns the last value to the dict instead of appending it.

A possible corrected solution

>>> GO_dict = defaultdict(set)
>>> for GO_names in GO_file:
   gene_id,_,GO_id = GO_names.partition(" ")
   gene_id = gene_id.split("_")[0]
   GO_dict[gene_id].add(GO_id)


>>> print GO_dict
defaultdict(<type 'set'>, {'A': set(['13', '12', '14']), 'B': set(['1', '5'])})

One possible problem with the above code is, the order of the elements are not guaranteed. Unfortunately the default library does not provide an OrderedSet, but we can easily customize OrderedDict to server our purpose

>>> GO_dict = defaultdict(OrderedDict)
>>> for GO_names in GO_file:
   gene_id,_,GO_id = GO_names.partition(" ")
   gene_id = gene_id.split("_")[0]
   GO_dict[gene_id][GO_id] = None


>>> OrderedDict([('A', ['12', '13', '14']), ('B', ['1', '5'])])
OrderedDict([('A', ['12', '13', '14']), ('B', ['1', '5'])])

But

There are cases, as this one I believe, where the itertools solution is more elegant than using defaultdict

>>> from itertools import groupby
>>> from operator import itemgetter
>>> GO_file_kv = [(key.split("_")[0], value) 
                   for key, value in (elem.split(" ") for elem in GO_file)]
>>> {key: OrderedDict.fromkeys([e for _, e in value]).keys()
     for key, value in groupby(sorted(GO_file_kv, key=itemgetter(0)),
                       key=itemgetter(0))
 }
{'A': ['12', '13', '14'], 'B': ['1', '5']} 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top