Pergunta

In Excel, I have a log of web requests that I need to analyze for bandwidth usage. I have parsed the log into a number of fields that I will groupby in different ways for different reports. Each website page load gets multiple resources - each being a separate line. The data structure:

 RequestID | SIZE | IsImage | IsStatic | Language
 A         | 100  | TRUE    | TRUE     | EN
 A         | 110  | TRUE    | FALSE    | EN
 A         | 90   | FALSE   | FALSE    | EN
 ...

Report 1: I need the AVERAGE request size: AVERAGE( SELECT SUM(SIZE) GROUPBY RequestID ). I do not need to see the size of each individual request.

Report 2: More elaborate pivot table reports showing average request req size broken by isStatic / isImage / language / etc. This way I can check "average total images per request per language"

Is there a way to define a field/item "SUM(SIZE) GROUPBY RequestID" ?

Foi útil?

Solução

As far as I know this is not possible to achieve in a single pivot table. This is because you need to apply two separate aggregations to the same set of number based on a condition (RequestId) It is possible to get what you are looking for using two pivot tables, however I would not recommend it but this is how you would do it.

Create the first pivot table on your base table, add the requestId to the rows and the size to value, this will give you an intermediate table with the sum of size per requestId, you then build a second pivot table, this time using the first as the source pivot table as the source, in this instance you will only add the ‘sum of size’ value and take the average of this. See below for example

enter image description here

Again I would not recommend this approach for anything but the most simple analysis

A better way to do this is to use powerpivot, a separate yet related technology to the pivot tables that you have used. You will need to import the table, I have assumed with the name [Logs] with columns [RequestId] and [Size] you will then need to add a calculation

AvarageSizeOfRequests:=AVERAGEX(SUMMARIZE(Logs;Logs[RequestId];"sumOfSize";CALCULATE(sum(Logs[Size])));[SumOfSize])

This will give you the following result

enter image description here

The first is the strait sum which you already have, the second is the average which will be the same per requestID but will aggregate differently.

Outras dicas

I guess I am not understanding your Q because I expect the group by for Request ID to be automatic (unavoidable in a PT with that as a Row label). Perhaps pick holes in the following and I might understand what I have misunderstood:

SO21820137 example

I have added i and s to your data just so it is clearer which column is which. It is possible it would be better to convert TRUE and FALSE into 1 and 0 so the PT might count or average these as well.

This seems vaguely along the right lines so let's try a different PT layout. It RequestID is of little or no relevance for the required analysis don't include it in the PT or, as here, park it as a Report Filter:

SO21820137 example

in which case however many millions of rows of data of the kind in the OP there are, the PT will always in effect be a 2x2 matrix at most (assuming Language is suited to Report Filter also). There is only one value per record (SIZE) and only two, boolean, variables. Language could make a difference but worst case is one such PT per Language (and bearing in mind only one such is shown in the example!...)

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top