Question

Sqlite3 by default sorts only by ascii letters. I tried to look in google, but the only thing I found were informations about collations. Sqlite3 has only NOCASE, RTRIM and BIARY collations. How to add support for a specific locale? (I'm using it in Rails application)

Was it helpful?

Solution

SQLite supports integration with ICU. According to the Readme file, sqlite/ext/icu/README.txt the sqlite/ext/icu/ directory contains source code for the SQLite "ICU" extension, an integration of the "International Components for Unicode" library with SQLite.

1. Features

    1.1  SQL Scalars upper() and lower()
    1.2  Unicode Aware LIKE Operator
    1.3  ICU Collation Sequences
    1.4  SQL REGEXP Operator

OTHER TIPS

I accepted Doug Currie answer, but I want to add some "algorithm" how to do it, because sqlite3 documentation is very strange (at least for me).

Ok, we have working sqlite3 and now:

  1. Download ICU extension for sqlite

  2. Compile it:

    gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
    

    It is for Linux. I also needed to install additional ICU development package:

    sudo apt-get install libicu-dev
    

    I'm working on 64 bit architecture and I get error with __relocation R_X86_64_32S__ (whatever it means :). GCC suggested adding -fPIC to compile options and it helped.

  3. Run sqlite3. We can load extension with command:

    .load './libSqliteIcu.so'
    

    Assuming that it is in the current directory, we can also specify whole path.

  4. Create new collation:

    SELECT icu_load_collation('pl_PL', 'POLISH');
    

    The first parameter is desired locale and the second is it's (it can be whatever).

  5. Now we can sort data with our new locale:

    SELECT * FROM some_table ORDER BY name COLLATE POLISH;
    

    And it is case insensitive!

If you can't afford to compile the ICU extension you can have a UDF do the same. In PHP/PDO:

$pdo->sqliteCreateFunction('locale',
    function ($data, $locale = 'root')
    {
        static $collators = array();

        if (isset($collators[$locale]) !== true)
        {
            $collators[$locale] = new \Collator($locale);
        }

        return $collators[$locale]->getSortKey($data);
    }
);

Example usage:

SELECT * FROM "table" ORDER BY locale("column", 'pt_PT');

I don't expect this approach to be as efficient as the native extension but it is surely more portable.

For those who are not able to build the extension themselves, I've made compiled versions available for macOS and Linux here: http://files.tempel.org/Various/Sqlite3ICUExtention

The Linux versions, for both Intel 32 and 64 bit, were built on Ubuntu 16, if that matters.

Generally, you should not trust compiled code supplied by others, but I'm quite a public person, meaning I'd run quite a risk if I'd provide a "bad" version. And to make sure there's no man-in-the-middle attack or hack done to my server, here are the MD5 hashes for the 3 files:

libSqliteIcu-i386.so = 6decd73f27d9c61243128e798304508f
libSqliteIcu-x86_64.so = b127c8a1f65503c91c61a21732eb11be
sqlite3_icu_extension.dylib = a29d59f6b74e7ef234691729b82da660
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top