Question

I'm using Oracle Text on Oracle 10g. I have created a multicolumn CONTEXT index as below:

My table is

CREATE TABLE WEB_RES
(   
    "ID" NUMBER(10,0), 
    "TITLE" VARCHAR2(256 BYTE), 
    "DESCRIPTION" VARCHAR2(1024 BYTE), 
    "CONTENT" BLOB, 
    "CATEGORY" VARCHAR2(64 BYTE), 
    ...
    CONSTRAINT "PK_WEB_RES" PRIMARY KEY ("ID")
}

My preference and index are:

execute ctx_ddl.create_preference('my_multi', 'MULTI_COLUMN_DATASTORE');
execute ctx_ddl.set_attribute('my_multi', 'columns', 'title, description, category');
execute ctx_ddl.create_preference( 'my_lexer', 'BASIC_LEXER' );
execute ctx_ddl.set_attribute('my_lexer', 'index_stems', '1');

create index myTitleIndex on web_res(title)
indextype is ctxsys.context
parameters ('DATASTORE my_multi lexer my_lexer SYNC(ON COMMIT)');

I have a row in my table as following:

ID            Title                                Description      Category
---        -----------                           --------------   ------------
1       Superannuation Contributions Splitting      Test            Test

English is not my native language but shouldn't this query return a result?

SELECT * FROM web_res WHERE CONTAINS(title, '$contribute', 1) > 0;

I expect to find a title with 'Contribution' when searching for '$Contribute' or find a title with 'approval' when searching for '$approve'. Isn't this what stemming is about?

Edit: I also tried the Basic_Wordlist as below, but still got no rows back for my query:

execute ctx_ddl.create_preference( 'my_wordlist', 'BASIC_WORDLIST' );
execute ctx_ddl.set_attribute('my_wordlist', 'stemmer', 'ENGLISH');

create index myTitleIndex on web_res(title)
indextype is ctxsys.context
parameters ('DATASTORE my_multi wordlist my_wordlist SYNC(ON COMMIT)');
Was it helpful?

Solution

can you please try to create a BASIC_WORDLIST with a stemmer preference set to english?

http://docs.oracle.com/cd/B19306_01/text.102/b14218/cdatadic.htm#i1008929


OK, I have researched it in more depth for you, and I don't think you'll like he reuslts... I used the following schema:

execute ctxsys.ctx_ddl.create_preference('my_multi','MULTI_COLUMN_DATASTORE');
execute ctxsys.ctx_ddl.set_attribute('my_multi','columns','a,b,c');
execute ctxsys.ctx_ddl.create_preference('my_lexer','BASIC_LEXER');
execute ctxsys.ctx_ddl.set_attribute('my_lexer','index_stems','1');
execute ctxsys.ctx_ddl.create_preference('my_wordlist','BASIC_WORDLIST');
execute ctxsys.ctx_ddl.set_attribute('my_wordlist','stemmer','ENGLISH');

create index tt on test(a) indextype is ctxsys.context parameters('DATASTORE my_multi WORDLIST my_wordlist LEXER my_lexer SYNC(on commit)');

insert into test values ('Superannuation Contributions Splitting','test','test');
insert into test values ('commit','Superannuation Contributions Splitting','test');
insert into test values ('commit','test','Superannuation Contributions Splitting');
commit;

So after creating the schema - I went to the DR$INDEX_NAME$I table, and looked at the data there. it contains : B,C,commit,contribution, contributions,split,splitting,superannuation,test. It means that these are the list of "roots" you can search for.

Since "contribute" is not here, you cannot "get to it".

But, if you'll search for '$contribution' you will find it, or even for "$committing" since the root of it, commit, is in the list.

Bottom line - Oracle does not know that contribute and contribution are related. This is either a bug in the engine or a linguistic property (you can never know).

BUT - I can assure you the stemming works - since the search of '$contribution' and '$split' will return the results...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top