Question

On PostgreSQL 10.4 on Ubuntu 10.4-2.pgdg16.04+1

select to_tsvector('simple', 'БОЛЬШИЕ БУКВЫ');

returns 'большие':1 'буквы':2

But on postgres 10.3 (installed by brew) on Mac Os High Sierra version 10.13.3 it returns 'БОЛЬШИЕ':1 'БУКВЫ':2 for not english letters

How can I fix it on Mac os?

Was it helpful?

Solution

The ability to identify the case of characters depend on the LC_CTYPE of your database, which by default, depends on the environment in which the PostgreSQL instance has been created (with initdb).

For instance, on Ubuntu with PostgreSQL11:

tstc=# show lc_ctype;
 lc_ctype 
----------
 C

tstc=# select to_tsvector('simple', 'БОЛЬШИЕ БУКВЫ');
      to_tsvector      
-----------------------
 'БОЛЬШИЕ':1 'БУКВЫ':2
(1 row)

That's the result you got on your db on MacOS.

But when I'm logged to a different database, with an UTF-8 locale this time:

postgres=# show lc_ctype;
  lc_ctype   
-------------
 fr_FR.UTF-8
(1 row)

postgres=# select to_tsvector('simple', 'БОЛЬШИЕ БУКВЫ');
      to_tsvector      
-----------------------
 'большие':1 'буквы':2
(1 row)

Now letters are put in lower case.

The fix is to create the database with the correct LC_CTYPE. It cannot be changed afterwards. By default, this setting comes from template1, but it can be overriden by choosing template0, if template1 does not suit you, for instance:

CREATE DATABASE newDB lc_ctype='C.UTF-8' template template0;

The locale specified with lc_ctype must also be supported by the system. Check with locale -a or some equivalent if that doesn't work on MacOS.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top