Вопрос

this seems a common problem, except I have had no trouble with this before and the usual fix does not work. It is probably something silly, but I can not find it.

I want to index a yammer site as the yammer api is not fast enough for my purpose, problem is that when I try to update my index using the updateDocument functionality the old ones are not deleted. But I have a stored unique key that is not analysed.

Here is the relevant code:

Document newdoc = new Document();
newdoc.add(new Field(YammerMessageFields.URL, resultUrl, Field.Store.YES, Field.Index.NOT_ANALYZED));
newdoc.add(new Field(YammerMessageFields.THREAD_ID, threadID.toString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
newdoc.add(new Field(YammerMessageFields.AUTHOR, senderName, Field.Store.YES, Field.Index.ANALYZED));
newdoc.add(new Field(YammerMessageFields.CONTENTS, resultText, Field.Store.YES, Field.Index.ANALYZED));
Term key = new Term(YammerMessageFields.THREAD_ID, newdoc.getFieldable(YammerMessageFields.THREAD_ID).toString());
logger.debug("updating document with key: " + key);
try {
    IndexWriter writer = getIndexWriter();
    writer.updateDocument(key, newdoc);
    writer.close();
} catch (IOException e) {
}

What I see in my log is:

2012-05-11 12:02:29,816 DEBUG [http-8088-2] LuceneIndex - https://www.yammer.com/api/v1/messages/?newer_than=0
2012-05-11 12:02:38,594 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173285202>
2012-05-11 12:02:45,167 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173033239>
2012-05-11 12:02:51,686 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173014568>
2012-05-11 12:02:51,871 DEBUG [http-8088-2] LuceneIndex - new items:3

2012-05-11 12:03:27,393 DEBUG [http-8088-2] YammerResource - return all documents
2012-05-11 12:03:27,405 DEBUG [http-8088-2] YammerResource - nr docs:3
2012-05-11 12:03:27,405 DEBUG [http-8088-2] YammerResource - nr dels:0

...
next update
...

2012-05-11 12:03:35,802 DEBUG [http-8088-2] LuceneIndex - https://www.yammer.com/api/v1/messages/?newer_than=0
2012-05-11 12:03:43,933 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173322760>
2012-05-11 12:03:50,467 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173285202>
2012-05-11 12:03:56,982 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173056406>
2012-05-11 12:04:03,533 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173033239>
2012-05-11 12:04:10,097 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173030769>
2012-05-11 12:04:16,629 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173014568>
2012-05-11 12:04:23,169 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173003570>
2012-05-11 12:04:23,341 DEBUG [http-8088-2] LuceneIndex - new items:7

2012-05-11 12:05:09,694 DEBUG [http-8088-1] YammerResource - return all documents
2012-05-11 12:05:09,696 DEBUG [http-8088-1] YammerResource - nr docs:10
2012-05-11 12:05:09,696 DEBUG [http-8088-1] YammerResource - nr dels:0

So the keys reoccur (and 4 new ones), but when this is done there are 10 documents in my store instead of 7 (and 3 deleted ones).

edit: here is how I find the items, but I actually display them and inspected it with Luke.

IndexReader r = IndexReader.open(searchIndex.getIndex());
                List<Document> docList = new ArrayList<Document>();
                List<Document> delList = new ArrayList<Document>();

                int num = r.numDocs();
                num += r.numDeletedDocs();
                for ( int i = 0; i < num && i < max; i++)
                {
                    if ( ! r.isDeleted( i))
                        docList.add(r.document(i));
                    else
                        delList.add(r.document(i));

                }
                r.close();
                logger.debug("nr docs:" + docList.size());
                logger.debug("nr dels:" + delList.size());
Это было полезно?

Решение

I'm not sure without running some test code, but this looks wrong to me:

Term key = new Term(YammerMessageFields.THREAD_ID, 
   newdoc.getFieldable(YammerMessageFields.THREAD_ID).toString());

Are you sure it shouldn't be:

Term key = new Term(YammerMessageFields.THREAD_ID, 
   newdoc.getFieldable(YammerMessageFields.THREAD_ID).stringValue());

You then go on to use that key to attempt to update any matching existing document. If the key is wrong, then presumably the document update will silently fail. I suspect that the toString() on that Term will actually just give you an Object reference, which means that the update will never work.

Calling toString() for anything other than logging or debugging (i.e. anything with logic in it) is usually a mistake.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top