Postgresql join_collapse_limit and time for query planning

Question 1

1) How can one measure planning time?

The new 9.4 version of PostgreSQL (not yet released at the time of this writing) is going to add planning time into EXPLAIN and EXPLAIN ANALYZE, and so you'll be able to use those.

For older versions, your assumption is right, the better way to determine planning time is by executing a simple EXPLAIN (no ANALYZE) and checking the time it took, in psql you can do it by enabling the \timing (I generally do that at ~/.psqlrc).

2) Approximately, how high can join_collapse_limit be and still expect planning to take less than a couple hundred milliseconds?

The PostgreSQL hackers team already discussed about raising it to bigger values. But looks like they couldn't guarantee that it would be good for all cases.

The problem is that the planning to find the best join order for N tables takes an O(N!) (factorial) approach. And so, the numbers the raise is very high, you can simple see that with the following query:

$ SELECT i, (i)! AS num_comparisons FROM generate_series(8, 20) i;
 i  |   num_comparisons   
----+---------------------
  8 |               40320
  9 |              362880
 10 |             3628800
 11 |            39916800
 12 |           479001600
 13 |          6227020800
 14 |         87178291200
 15 |       1307674368000
 16 |      20922789888000
 17 |     355687428096000
 18 |    6402373705728000
 19 |  121645100408832000
 20 | 2432902008176640000
(13 rows)

As you can see, at the default of 8 we do at most about 40K comparisons, the 10 you proposed makes it go to 3M, which is still not very much for modern computers, but the next values start becoming too large, it just increase too fast, the 20 is just insane (21! doesn't even fits a 64 bits integer).

Of course, sometimes you can set it to larger values like 16, that could (in theory) make up to about 20 trillions comparisons, and still have very good planing time, that is because PostgreSQL cut some paths while planning and don't need to always check all orders, but assuming that it'll always be the case and make such high values the default, doesn't look like a good approach to me. There may be some unexpected query in the future that make it goes to checking all the orders and then you have one only query that put your server down.

In my experience, I assume the 10 as a default value on any installation in good servers, some of them I even use 12. I recommend you to set it to 10, if you like, and at some times, try setting it higher (I wouldn't go beyond 12) and keep monitoring (closely) to see how it behaves.

Question 2

This combination has been performing well for me in case of 30+ joins generated by NHibernate hierarchies, creating much better plans than with the default settings.

SET geqo = on;
SET geqo_threshold = 12;

SET from_collapse_limit = 40;
SET join_collapse_limit = 40;

In such cases suboptimal query plans are often created unless the collapse limits are set to a larger number than the max expected joins. The trick is that when using such high values for from_collapse_limit and join_collapse_limit you have to also use geqo with low enough geqo_threshold to avoid the (N!) problem.

This seems the very opposite of what the documentation is recommending to do:

Setting this value to geqo_threshold or more may trigger use of the GEQO planner, resulting in non-optimal plans.

But without geqo, or a value of it's threshold set too high even a value as low as 16 will cause the planning phase to take 10+ seconds and not be able to pick a good plan. In this case the 'non-optimal' plan provided by geqo in ~100ms is more desirable than the optimal plan taking hours of CPU time to compute.

Same settings can decide not to use geqo and get stuck in endless planning - your mileage may vary.