Python - sumif from csv

Question 1

[Community wiki, because it's a little tangential.]

When you're processing tabular data in Python, you should consider the pandas library. The operation you want to perform is a groupby sum, and that's easily done in two lines:

df = pd.read_csv("factories.csv")
by_factory = df.groupby("Factory")["Cost"].sum()

which produces a Series object you can index into like a dictionary:

>>> by_factory
Factory
Bali       32
Denver      8
Sydney     25
Name: Cost, dtype: int64
>>> by_factory["Bali"]
32

Update, using the updated data-- if you also want to handle Cost_Type, you have several options. One is to select only the rows with Cost_Type == 1:

>>> df[df.Cost_Type == 1]
  Factory  Cost  Cost_Type
1  Sydney    21          1
3  Denver     8          1
4    Bali     9          1

[3 rows x 3 columns]
>>> df[df.Cost_Type == 1].groupby("Factory")["Cost"].sum()
Factory
Bali        9
Denver      8
Sydney     21
Name: Cost, dtype: int64

or you can expand the groupby and group on both Factory and Cost_Type simultaneously:

>>> df.groupby(["Cost_Type", "Factory"])["Cost"].sum()
Cost_Type  Factory
0          Bali       23
1          Bali        9
           Denver      8
           Sydney     21
2          Sydney      4
Name: Cost, dtype: int64

Question 2

The easiest way is to use a dictionary to hold the count for each factories:

factoriescost = {}
for row in cvs.reader(costcsv):
    factory = row[0]
    if factory not in ('Bali', 'Sydney', 'Denver'):
        continue
    factorycost = factoriescost.get(factory, 0)
    factoriescost[factory] = factorycost + float(row[1])
totalcost = sum(factoriescost.itervalues())

Then you can use factoriescost to get the total for a given factory:

>>> print totalcost, factoriescost
65.0 {'Denver': 8.0, 'Sydney': 25.0, 'Bali': 32.0}
>>> print factoriescost['Bali']
32.0

Question 3

You can use a dictionary as shown below. The code uses a try loop to sum the cost of the factories in the dictionary, if the factory is not already inside the dictionary then a KeyError will be thrown and so the factory is simply added.

a = [['Bali', 23],
     ['Sydney', 21],
     ['Sydney', 4],
     ['Denver', 8],
     ['Bali', 9]]

factories = dict()

for factory, cost in a:
    try:
        factories[factory] += cost
    except KeyError:
        factories[factory] = cost

print(factories)
# {'Denver': 8, 'Sydney': 25, 'Bali': 32}

In your example case you would replace the for loop with an appropriate one for csv.reader() along the lines of:

for factory, cost in csv.reader(costcsv):
    try:
        ...

Question 4

Your csv should be:

Factory,Cost
Bali,23
Sydney,21
Sydney,4
Denver,8
Bali,9

And in python you can:

import csv

factories= ['Bali', 'Sydney', 'Denver']
totalcost = 0

sums = {}

with open('file.csv', 'rb') as f:
    f.next()                        # Jump to second row -> first : header
    reader = csv.reader(f)
    for row in reader:
        if row[0] not in sums:
            sums[row[0]] = int(row[1])
        else:
            sums[row[0]] += int(row[1])


for key,value in sums.items():
    totalcost = totalcost  + int(value)

The result look like:

print sums
>{'Denver': 8, 'Sydney': 25, 'Bali': 32}
print totalcost
>65

Question 5

Rather than having separate variables, consider a dictionary or, easier, collections.defaultdict:

from collections import defaultdict

costs = defaultdict(float)

for line in csv.reader(costcsv):
    if len(line) == 2:
        factory, costs = line
        costs[factory] += float(cost)

This will give you an output where you can select any factory (not just the three you currently hard-code) and get the total cost

cost["denver"] == 8.0