I have a long Excel sheet called Data
looking something like this:
A B
1 Name Usage
2 Peter 1
3 Johnny 3
4 Johnny 1
5 Peter 1
6 Jack 2
...
20000 Johnny 1
Nevermind column B ("Usage") for the moment. I know I can count the number of times Johnny is listed in column A using countif
and I can sum Johnny's usage using sumif
.
However, what I'm trying to do is to is to get a sum of Johnny's first 100, 200, etc. occurrences (The sum of his first 2 occurences, above, would be 4).
And to get this, I first need to count after how many names in the list (column A), Johnny has been mentioned 100, 200, etc. times. (In the example, he has been mentioned twice in row 4, so if we deduct A1, the desired result is 'Johnny's first 2 occurences are reached after 3 names', 'Peter's first 2 occurences are reached after 4 names')
Essentially I want a second sheet, called Results
that looks like this:
A B C D E
1 n Peter P.sum Johnny J.sum
2 2 4 2 3 4
3 100 215 220 312 312
4 200 812 480 462 715
5 500 9850 1421 5425 3212
My question is what formula to put into columns B and D of the 'Results'-sheet. I cannot use VBA, I cannot use a helper-column, and I cannot sort the 'Data'-sheet differently.
Now I actually do have a working solution, but its consumption of CPU-power is extremely excessive, to the point where the sheet takes over an hour just to compute results for one name. I've used the following array-formula in B2:
{=MATCH($A2, COUNTIF(INDIRECT("'Data'!$A$2"&":"&ADDRESS(ROW($A$2:$A$20000),COLUMN($A$2))),B$2),0)}
This works, but there must be an easier solution, easier on the CPU I mean. Please help me to find it.
The problem is, that in code, each cell computes essentially a function looking roughly like this:
function (n=2 /*or 100, 200, etc*/, name="Peter") {
int counter1;
for (int i1=2; i1<20000; i1++) {
int counter2=0;
for (int i2=2; i2<i1; i2++) {
if (Data!A[i2]==name) counter2++;
}
if (counter2==n) counter1=i1;
}
return counter1;
}
In other words, if we had only five rows, it would (for just one result) loop through row 1, then rows 1-2, then rows 1-3, then rows 1-4, then rows 1-5. With 20000 rows, that makes for a staggering number of calculations.
In code, there would be a far easier solution, something like this:
function (n=2 /*or 100, 200, etc*/, name="Peter") {
int counter1=0;
for (int i1=2; i1<20000; i1++) {
if (Data!A[i1]==name) counter1++;
if (counter1==n) return(i1);
}
}
This would make for a maximum of 200000 computations per cell, and I am looking for an Excel equivalent, probably an array function.