Question

I need to find the median of all the integers associated with each key (AA, BB). The basic format my code leads to:

AA - 21
AA - 52
BB - 3
BB - 2

My code:

def scoreData(filename):
   d = dict() 
   fin = open(filename) 
   contents = fin.readlines()
   for line in contents:
       parts = linesplit() 
       part[i] = int(part[1]) 
       if parts[0] not in d:
           d[parts[0]] = list(parts[1])  
       else:
           d[parts[0]].append(parts[1]) 
   names = list(d.keys()) 
   names.sort() #alphabeticez the names
   print("Name\+Max\+Min\+Median")
   for name in names: #makes the table
       print (name"\+", max(d[name]),\+min(d[name]),"\+"median(d[name]))

I'm afraid following the same format as the "names" and "names.sort" will completely restructure the data. I've thought about "from statistics import median," but once again I do not know how to only select the values associated with each of the same keys.

Thanks in advance

Was it helpful?

Solution

You can do it easily with pandas and numpy:

import pandas
import numpy as np

and aggregating by first row:

score = pandas.read_csv(filename, delimiter=' - ', header=None)
print score.groupby(0).agg([np.median, np.min, np.max])

which returns:

         1
    median  amin  amax
0
AA    36.5    21    52
BB     2.5     2     3

OTHER TIPS

There are many, many ways you can go about this. But here's a 'naive' implementation that will get the job done.

Assuming your data looks like:

AA  1
BB  5
AA  2
CC  7
BB  1

You can do the following:

import numpy as np
from collections import defaultdict

def find_averages(input_file)
    result_dict = defaultdict(list)
    for line in input_file.readlines()
        key, value = line.split()
        result_dict[key].append[int(value)]

    return [(key, np.mean(value)) for key,value in result_dict.iteritems()]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top