Domanda

I'm really at my wits end, with this Problem, and I really hope someone could help me. I am using a Postgresql 9.3. My Database contains mostly german texts but not only, so it's encoded in utf-8. I want to establish a fulltextsearch wich supports german language, nothing special so far. But the search is behaving really strange,, and I can't find out what I am doing wrong.

So, given the following table given as example

select * from test;
      a      
-------------
 ein Baum
 viele Bäume
 Überleben
 Tisch
 Tische
 Café

\d test
   Tabelle »public.test«
Spalte | Typ  | Attribute 
--------+------+-----------
a      | text | 

sintext=# \d
                Liste der Relationen
 Schema |        Name         |   Typ   | Eigentümer 
--------+---------------------+---------+------------
 (...)
 public | test                | Tabelle | paf

Now, lets have a look at some textsearch examples:

select * from test where to_tsvector('german', a) @@ plainto_tsquery('Baum');
      a      
-------------
 ein Baum
 viele Bäume

select * from test where to_tsvector('german', a) @@ plainto_tsquery('Bäume');
--> No Hits

select * from test where to_tsvector('german', a) @@ plainto_tsquery('Überleben');
--> No Hits

select * from test where to_tsvector('german', a) @@ plainto_tsquery('Tisch');
   a    
--------
 Tisch
 Tische

Whereas Tische is Plural of Tisch (table) and Bäume is plural of Baum (tree). So, Obviously Umlauts does not work while textsearch perfoms well.

But what really confuses me is, that a) non-german special characters are matching

select * from test where to_tsvector('german', a) @@ plainto_tsquery('Café');
   a   
 ------
  Café

and b) if I don't use the german dictionary, there is no Problem with umlauts (but of course no real textsearch as well)

select * from test where to_tsvector(a) @@ plainto_tsquery('Bäume');
     a      
-------------
 viele Bäume

So, if I use the german dictionary for Text-Search, just the german special characters do not work? Seriously? What the hell is wrong here? I Really can't figure it out, please help!

È stato utile?

Soluzione

You're explicitly using the German dictionary for the to_tsvector calls, but not for the to_tsquery or plainto_tsquery calls. Presumably your default dictionary isn't set to german; check with SHOW default_text_search_config.

Compare:

regress=> select plainto_tsquery('simple', 'Bäume'),
                 plainto_tsquery('english','Bäume'), 
                 plainto_tsquery('german', 'Bäume');
 plainto_tsquery | plainto_tsquery | plainto_tsquery 
-----------------+-----------------+-----------------
 'bäume'         | 'bäume'         | 'baum'
(1 row)

The language setting affects word simplification and root extraction, so a vector from one language won't necessarily match a query from another:

regress=> SELECT to_tsvector('german', 'viele Bäume'), plainto_tsquery('Bäume'),
          to_tsvector('german', 'viele Bäume') @@ plainto_tsquery('Bäume');
    to_tsvector    | plainto_tsquery | ?column? 
-------------------+-----------------+----------
 'baum':2 'viel':1 | 'bäume'         | f
(1 row)

If you use a consistent language setting, all is well:

regress=> SELECT to_tsvector('german', 'viele Bäume'), plainto_tsquery('german', 'Bäume'),
                 to_tsvector('german', 'viele Bäume') @@ plainto_tsquery('german', 'Bäume');
    to_tsvector    | plainto_tsquery | ?column? 
-------------------+-----------------+----------
 'baum':2 'viel':1 | 'baum'          | t
(1 row)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top