Question

I have an app storing XML in an Oracle table as XMLType. I want to do full text searches on that data. The Oracle documentation, in Full-Text Search Over XML Data, recommends to use the contains SQL function, which requires the data to be indexed with a context index. The trouble is that it appears that context indexes are asynchronous, which doesn't fit the use case I have where I need to be able to search through data right after it was added.

Can I make that index somehow synchronous? If not, what other technique should I use to do full text searches on an XMLType?

Was it helpful?

Solution

It can't be made transactional (i.e. it won't update the index so that the change is visible to a subsequent statement within the transaction). The best you can do is make it update on commit (SYNC ON COMMIT), as in:

create index your_table_x
    on your_table(your_column)
    indextype is ctxsys.context
    parameters ('sync (on commit)');

Text indexes are complex things and I'd be surprised if you could achieve a transactional / ACID compliant text index (that is, transaction A inserting documents and have those visible in the index for that transaction and not visible to transaction B until commit).

OTHER TIPS

  1. You could update the index at a regular interval, in a cron-like kind of way. At worse, you can update the index after every update to the table, with sync_index on which the index is built. For instance: EXEC CTX_DDL.SYNC_INDEX('your_index'); I am not a big fan of this technique because of the complexity it introduces. In addition to the cron-like aspect, you have to deal with index fragmentation, which might require you to do full updates from time to time. Update: instead of updating the index at a regular interval, you can update it on commit, as suggested by Gary, which is really what you're looking for.

  2. You can do a simple text search on the XML document, as if you were doing a ctrl-f with the XML in a text editor. In many cases, this doesn't give you the expected result as users don't care if the string they are searching for happens to be used in an element name, attribute name, or namespace. But, if this method works for you, go for it: it is simple and fairly fast. For instance:

    select count(*) from your_table d
    where lower(d.your_column.getClobVal()) like '%gaga%';
    
  3. Using existsNode() in a where clause, as in the example below. There are two potential issues with this. First, without proper indexes, this is slower then the method #2, by a factor of about 2 in my testing, and I am not sure how to create an index on unstructured data that would be used by this query. Second, you'll be doing a case-sensitive search, which is often not what you want. And you can't just call XPath's lower-case(), as Oracle only supports XPath 1.0.

    select * from your_table 
    where existsNode(your_column, '//text()[contains(., "gaga")]') = 1;
    
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top