Question

I've got a Postgres ORDER BY issue with the following table:

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

To insert a new record to the table,

  1. I select the last record with SELECT * FROM employees ORDER BY em_code DESC
  2. Strip alphabets from em_code usiging reg exp and store in ec_alpha
  3. Cast the remating part to integer ec_num
  4. Increment by one ec_num++
  5. Pad with sufficient zeors and prefix ec_alpha again

When em_code reaches EM1000, the above algorithm fails.

First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.

Any idea how to select EM1000?

Was it helpful?

Solution

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9. You could solve it like this:

SELECT * FROM employees ORDER BY substring(em_code, 3)::int DESC

It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.


Additional answer to question in comment

To strip any and all non-digits from a string:

SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees

\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.

So I replace every non-digit with the empty string distilling solely digits from the string.

OTHER TIPS

One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql

To use it simply call the function in your order by:

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC

This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:

https://github.com/Bjond/pg_natural_sort_order

It's free to use, MIT license.

Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.

The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.

you can use just this line "ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"

I wrote about this in detail in this related question:

Humanized or natural number sorting of mixed word-and-number strings

(I'm posting this answer as a useful cross-reference only, so it's community wiki).

I came up with something slightly different.

The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )

I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.

https://stackoverflow.com/a/47522040/935122

I've also put it on GitHub

https://github.com/ccsalway/dbNaturalSort

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top