Вопрос

So I have a dataset with two columns, one a string variable with names of products, and the other interval values.

Affiliate_ID          Average "A" Level
store X                      7.0
store Y                      4.3
store Z                      5.6

I am curious if it is possible in python to compute and sum all possible pairwise differences, without repeats.

Sum = |7.0 - 4.3| + |4.3 - 5.6| + |7.0 - 5.6|

I don't know what format is best for python to do such an operation, but I have the data in a csv file and in an excel file. I use pandas to get the data into a dataframe. One of the things I've tried is to grab a particular column from the dataframe

df = pd.DataFrame.from_csv(infile_path + "mp_viewed_item_AGG_affiliate_item_TOP_10.csv", sep=',')
i = 0
for i in df:
    x = df[i]

But this feels incorrect - like it is going nowhere (not that I'd know!)

Someone suggested that I make use of something called itertools, and provided me with a sample

sum([args[i] - args[j] for i,j in itertools.permutations(range(len(args)

but I really don't know how to make this work.

If anyone could provide me with some insight into my problem, I would be very grateful. I'm a newbie to python; I know basics, have written a couple very simple programs but am not a developer at all.

Это было полезно?

Решение

import itertools
column = [3, 1, 7, 2, 9, 4]

You can make a set of pairs like this

# You can use set() instead of list() if you want to remove duplicates
list(itertools.combinations(column,2))

Output

[(3, 1), (3, 7), (3, 2), (3, 9), (3, 4),
 (1, 7), (1, 2), (1, 9), (1, 4),
 (7, 2), (7, 9), (7, 4),
 (2, 9), (2, 4),
 (9, 4)]

Then you can get the sum of differences using a list comprehension

sum([abs(pair[1] - pair[0]) for pair in itertools.combinations(column,2)])

Output

56

Другие советы

Use itertools.combinations like this.

import pandas as pd
import itertools

d = {'Affiliate_ID': pd.Series(['store X', 'store Y', 'store Z']), 'Average "A" Level': pd.Series([7.0, 4.3, 5.6])}
df = pd.DataFrame(d)

print sum(abs(x - y) for x, y in itertools.combinations(df['Average "A" Level'], 2))
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top