Question

I am currently using Sphinx for indexing a MySQL query with 20+ million records.

I am using a delta index to update the main index and add all new records.

Unfortunately allot of changes to the tables are deleted.

I understand that I can use sql_query_killlist to get all document ID's that need to be deleted or updated. Unfortunately I don't understand how this actually works and the documentation from Sphinx does not have a good enough example for me to understand.

If I use the following example, how could I implement the killlist?

in MySQL

CREATE TABLE sph_counter
(
    counter_id INTEGER PRIMARY KEY NOT NULL,
    max_doc_id INTEGER NOT NULL
);

in sphinx.conf

source main
{
    # ...
    sql_query_pre = SET NAMES utf8
    sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
    sql_query = SELECT id, title, body FROM documents \
        WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

source delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query = SELECT id, title, body FROM documents \
        WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

index main

{
    source = main
    path = /path/to/main
    # ... all the other settings
}

note how all other settings are copied from main, but source and path are overridden (they MUST be) index delta : main

{
    source = delta
    path = /path/to/delta
}
Was it helpful?

Solution

The specifics depend a lot on how you mark deleted documents. But would just add something like

 sql_query_killist = SELECT id FROM documents 
                     WHERE status='deleted' 
                           AND id<=( SELECT max_doc_id FROM sph_counter 
                                     WHERE counter_id=1 )

to the delta index. That would capture ids of deleted record that are in the main index, and add them to the killlist so they would never appear in search results.

If want to capture updated records, need to arrange for the new rows to be included in the main sql_query of the delta, AND their ids to be in the kill-list.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top