Question

I know the meaning of distinct and also generate series. But when I execute this query, question marks are flying around my head.

select distinct generate_series(0,8)

The result is very weird.

enter image description here

Can somebody please help me to explain what is happening?

Was it helpful?

Solution

A SELECT query with no ORDER BY clause has no defined order, it will simply return the relevant rows in whatever order happens to be convenient to the executing DBMS.

In the case of a "real" table, this might be in order of PRIMARY KEY, in the order they were inserted into the table, or in the order of a particular index that was used in the execution plan.

In this example, the "table" created by generated_series() obviously starts off in the order 0, 1, 2, 3, etc. However, in order to check the DISTINCT constraint you put on the query, Postgres has to do something to check if items appear more than once. (There is no way for it to know that the generate_series() function will always provide distinct values.)

An efficient way of doing this (in general) is to build a "hash map" of the values you want to check for uniqueness. Rather than checking each new value against every existing value, you calculate which "hash bucket" it would fall into; if the bucket is empty, the value is unique; if not, you need only compare it against the other values in that bucket.

Running EXPLAIN select distinct generate_series(0,8) will show you the query plan Postgres has selected; for me (and presumably for you) this looks like this:

HashAggregate  (cost=0.02..0.03 rows=1 width=0)
  ->  Result  (cost=0.00..0.01 rows=1 width=0)

As expected, there's a HashAggregate operation there, running over the result of the generate_series() in order to check it for uniqueness. (Exactly how that operation works I don't know, and isn't important, but the name strongly suggests it's using a hash map to do the work).

At the end of the hashing operation, Postgres can simply read out the values from the hash map, rather than going back to the original list, so it does so. As a result, they are no longer in the original order, but ordered according to the "hash buckets" they fell into.

The moral of the story is: Always use an ORDER BY clause!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top