Question

I have a data set containing the following information:

  • Workload name
  • Configuration used
  • Measured performance

Here you have a toy data set to illustrate my problem (performance data does not make sense at all, I just selected different integers to make the example easy to follow. In reality that data would be floating point values coming from performance measurements):

  workload cfg perf
1        a   1    1
2        b   1    2
3        a   2    3
4        b   2    4
5        a   3    5
6        b   3    6
7        a   4    7
8        b   4    8

You can generate it using:

dframe <- data.frame(workload=rep(letters[1:2], 4),
                     cfg=unlist(lapply(seq_len(4),
                                function(x) { return(c(x, x)) })),
                     perf=round(seq_len(8))
                    )

I am trying to compute the harmonic speedup for the different configurations. For that a base configuration is needed (cfg = 1 in this example). Then the harmonic speedup is computed as:

                          num_workloads
HS(cfg_i) = num_workloads /   sum     (perf(cfg_base, wl_j) / perf(cfg_i, wl_j))
                              wl_j

For instance, for configuration 2 it would be:

HS(cfg_2) = 2 / [perf(cfg_1, wl_1) / perf(cfg_2, wl_1) +
                 perf(cfg_1, wl_2) / perf_cfg_2, wl_2)]

I would like to compute harmonic speedup for every workload pair and configuration. By using the example data set, the result would be:

  workload.pair cfg      harmonic.speedup
1      a-b       1    2 / (1/1 + 2/2) = 1 
2      a-b       2    2 / (1/3 + 2/4) = 2.4
3      a-b       3    2 / (1/5 + 2/6) = 3.75
4      a-b       4    2 / (1/7 + 2/8) = 5.09

I am struggling with aggregate and ddply in order to find a solution that does not uses loops, but I have not been able to come up with a working solution. So, the basic problems that I am facing are:

  • how to handle the relationship between workloads and configuration. The results for a given workload pair (A-B), and a given configuration must be handled together (the first two performance measurements in the denominator of the harmonic speedup formula come from workload A, while the other two come from workload B)
  • for each workload pair and configuration, I need to "normalize" performance values with the values from configuration base (cfg 1 in the example)

I do not really know how to express that with some R function, such as aggregate or ddply (if it is possible, at all).

Does anyone know how this can be solved?

EDIT: I was somehow afraid that using 1..8 as perf could lead to some confusion. I did that for the sake of simplicity, but the values do not need to be those ones (for instance, imagine initializing them like this: dframe$perf <- runif(8)). Both James and Zach's answers understood that part of my question wrong, so I thought it was better to clarify this in the question. Anyway, I generalized both answers to deal with the case where performance for configuration 1 is not (1, 2)

Was it helpful?

Solution

Try this:

library(plyr)
baseline <- dframe[dframe$cfg == 1,]$perf
hspeed <- function(x) length(x) / sum(baseline / x)
ddply(dframe,.(cfg),summarise,workload.pair=paste(workload,collapse="-"),
    harmonic.speedup=hspeed(perf))
  cfg workload.pair harmonic.speedup
1   1           a-b         1.000000
2   2           a-b         2.400000
3   3           a-b         3.750000
4   4           a-b         5.090909

OTHER TIPS

For problems like this, I like to "reshape" the dataframe, using the reshape2 package, giving a column for workload a, and a column for workload b. It is then easy to compare the 2 columns using vector operations:

library(reshape2)
dframe <- dcast(dframe, cfg~workload, value.var='perf')
baseline <- dframe[dframe$cfg == 1, ]
dframe$harmonic.speedup <- 2/((baseline$a/dframe$a)+(baseline$b/dframe$b))
> dframe
  cfg a b harmonic.speedup
1   1 1 2         1.000000
2   2 3 4         2.400000
3   3 5 6         3.750000
4   4 7 8         5.090909
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top