Question

In this answer the following formula is given.

{=PERCENTILE(IF(COLUMN(INDIRECT("1:"&MAX(Data!N3:N15)))<=Data!N3:N15,Data!M3:M15,""),0.25)}

It is supposed to calculate the first quartile of data with weights.

I've been trying to understand what this formula says, but I can't.

First of all, what are the curly brackets? (Probably a silly question.)

Second, what does the <= operator do when it's given two data ranges as operands?

Thrid, how can this possibly give the right answer, if, regardless of what the condition in the if-statement does, the returned data is just the value column? It seems to me that what the formula does is

  • if a weird condition is satisfied, calculate the first quartile of the value column.
  • if it's not, calculate the first quartile of "".

This seems completely wrong...

Also, where can I find a complete manual of Excel functions and operators? The online help file says nothing about comparing two ranges.

Was it helpful?

Solution

Personally I wouldn't use the quoted formula - I doubt if it does exactly what the author thinks it does and in some circumstances it will give incorrect results. You can see the problem because this part

=COLUMN(INDIRECT("1:"&MAX(Data!N3:N15))

returns exactly the same thing if MAX(Data!N3:N15) is 2 or 200. If MAX(Data!N3:N15)=2 you get

=COLUMN(INDIRECT("1:2"))

and I assume the author's intent is to return an array like {1,2}.......but it doesn't do that. INDIRECT("1:2") gives you 1:2 which is interpreted as the whole of rows 1 and 2 so =COLUMN(1:2) gives you the numbers of all columns (which is either 1 to 256 or 1 to 16384 depending on which version of excel you are using or whether you are using compatability mode or not).

You can test that by using this formula in a cell

=COUNT(COLUMN(INDIRECT("1:"&MAX(Data!N3:N15))))

confirmed with CTRL+SHIFT+ENTER

You'll either get 256 or 16384 but that doesn't depend on the values in column N at all.

The formula may well give you the correct result but it might not work correctly if any values in Data!N3:N15 are > 256 (or 16384 depending on version).

This version of the formula should do as intended in all cases:

=PERCENTILE(IF(TRANSPOSE(ROW(INDIRECT("1:"&MAX(Data!N3:N15)))) <=Data!N3:N15,Data!M3:M15,""),0.25)

.....but to explain how it works lets look at a cut down version with only 4 rows, i.e.

=PERCENTILE(IF(TRANSPOSE(ROW(INDIRECT("1:"&MAX(Data!N3:N6))))<=Data!N3:N6,Data!M3:M6,""),0.25)

and assume that M3:M6 contains these values - 10, 75, 15, 23 and that N3:N6 contains these values 1,2,3,4,

Now MAX(Data!N3:N6) = 4 so INDIRECT gives you "1:4" and that is passed to the ROW function so you get an array like {1;2;3;4} [which is a "column" of values] but TRANSPOSE converts that to {1,2,3,4} [which is a "row" of values] - the reason for that conversion is because when a column is compared to a row or vice versa it compares every value in one array with every value in the other array which is what is needed here (giving a 4x4 matrix of values as a result).

Now when we do that every value in {1,2,3,4} is <= 4 (N6 value) but only 3 are <= 3 (N5 value), only 2 are <= 2 (N4 value) etc. so the array that is passed to the PERCENTILE function correctly returns 1 value of 10, 2 values of 75, 3 values of 15 and 4 values of 23 (the other values are "" blanks which PERCENTILE ignores)

The result of my example would be 15 and the original formula also gives 15.......but as explained above the original formula might get incorrect results with larger numbers - e.g. I'm testing with compatability mode in Excel 2007 and if I change N4 to 4000 and N6 to 1000 I now expect the result to be 75 (which is the result my formula gives) .....but original formula gives 23.

OTHER TIPS

  1. The innermost operation is MAX(Data!N3:N15) and this looks for the highest value in the range M3:M15. Let's say that the maximum is 10.
  2. The second operation is INDIRECT("1:"&10), which INDIRECT turns into the range 1:10.
  3. COLUMN returns an array, containing the column numbers of the range 1:10. (If it was the range A1:B1, you would get the array {1,2}. If it was B1:D1, the array would be {2,3,4}. Thus COLUMN(1:10) returns {1, 2, 3, etc...}.
  4. Data!N3:N15 should be easy to understand. The range itself, is an array. Let's say it contains the values: {2, 3, , 5, etc...}
  5. Comparison: {1, 2, 3, etc...} <= {2, 3, , 5, etc...}. If the comparison is satisfied, return the values in range Data!M3:M15. Let's say this one contains {15, 14, 13, 12, 11, etc... }.

    • 1st comparison: 1 <= 2 (1st elements of both arrays) is true, hence, return 15.
    • 2nd comparison: 2 <= 3 (2nd elements of both arrays) is true, hence, return 14.
    • 3rd comparison: 3 <= "" (3rd elements of both arrays) is false, hence, return "" as per IF other result.
    • etc...

    Resulting array is thus {15, 14, "", 12, etc... }.

  6. Get the percentile of this array.

Since this is an array formula, the formula had to be used using Ctrl+Shift+Enter. Whenever you do this, the braces wrap around the formula automatically.

Addendum I didn't get to add earlier (this little bit is what I would have put before seeing barry houdini's answer):

Whatever the maximum value in the range of N3:N15 (except for those above 256 or 16384 depending on the excel version), COLUMN(INDIRECT("1:"&MAX(Data!N3:N15))) will return a constant value, which I believed made the formula wrong somehow.

For a better explanation though, see barry houdini's answer.

To answer the first part of the question - the curly brackets indicate an array formula (function). The PERCENTILE function is one of them. To enter the formula you need to use: CTRL+SHIFT+ENTER, pure ENTER is not enough.

You can read more about it here:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top