Question

I have a long Excel sheet called Data looking something like this:

    A       B
1   Name    Usage
2   Peter   1
3   Johnny  3
4   Johnny  1
5   Peter   1
6   Jack    2
...
20000   Johnny  1

Nevermind column B ("Usage") for the moment. I know I can count the number of times Johnny is listed in column A using countif and I can sum Johnny's usage using sumif.

However, what I'm trying to do is to is to get a sum of Johnny's first 100, 200, etc. occurrences (The sum of his first 2 occurences, above, would be 4). And to get this, I first need to count after how many names in the list (column A), Johnny has been mentioned 100, 200, etc. times. (In the example, he has been mentioned twice in row 4, so if we deduct A1, the desired result is 'Johnny's first 2 occurences are reached after 3 names', 'Peter's first 2 occurences are reached after 4 names')

Essentially I want a second sheet, called Resultsthat looks like this:

    A       B       C       D       E
1   n       Peter   P.sum   Johnny  J.sum
2   2       4       2       3       4
3   100     215     220     312     312
4   200     812     480     462     715
5   500     9850    1421    5425    3212

My question is what formula to put into columns B and D of the 'Results'-sheet. I cannot use VBA, I cannot use a helper-column, and I cannot sort the 'Data'-sheet differently.

Now I actually do have a working solution, but its consumption of CPU-power is extremely excessive, to the point where the sheet takes over an hour just to compute results for one name. I've used the following array-formula in B2:

{=MATCH($A2, COUNTIF(INDIRECT("'Data'!$A$2"&":"&ADDRESS(ROW($A$2:$A$20000),COLUMN($A$2))),B$2),0)}

This works, but there must be an easier solution, easier on the CPU I mean. Please help me to find it.

The problem is, that in code, each cell computes essentially a function looking roughly like this:

function (n=2 /*or 100, 200, etc*/, name="Peter") {
  int counter1;
  for (int i1=2; i1<20000; i1++) {
    int counter2=0;
    for (int i2=2; i2<i1; i2++) {
      if (Data!A[i2]==name) counter2++;
    }
    if (counter2==n) counter1=i1;
  }
  return counter1;
}

In other words, if we had only five rows, it would (for just one result) loop through row 1, then rows 1-2, then rows 1-3, then rows 1-4, then rows 1-5. With 20000 rows, that makes for a staggering number of calculations. In code, there would be a far easier solution, something like this:

function (n=2 /*or 100, 200, etc*/, name="Peter") {
  int counter1=0;
  for (int i1=2; i1<20000; i1++) {
    if (Data!A[i1]==name) counter1++;
    if (counter1==n) return(i1);
  }
}

This would make for a maximum of 200000 computations per cell, and I am looking for an Excel equivalent, probably an array function.

Was it helpful?

Solution

=COUNTIF(INDIRECT("A1:A"&SMALL(IF(A1:A9=C1,ROW(A1:A9)),D1)),C1)
=SUMIF(INDIRECT("A1:A"&SMALL(IF(A1:A9=C1,ROW(A1:A9)),D1)),C1,B1:B9)

Both with an array formula (CTRL + Enter)....

where

A1 = start range (names)
A9 = end range (names)
C1 = Name to match (Johny, Jack, Peter)
D1 = No of counts required till
B1 = Sum range start
B2 = sum range end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top