Question

Background: I am using Solr 4.0 to index an Oracle 11g database. I launch an import using wget and the HTTP data-import command from a Solaris/UNIX crontab (http://wiki.apache.org/solr/DataImportHandler). Rather than delta import, in my data config queries to the database I'm using the delta query via full import explained here (http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport). This way of doing it distinguishes between a full import and delta query using clean=true and clean=false. This is necessary because of the application that produces the data I'm indexing.

The problem: The queries, full import, and delta-query via full import all work fine when i do the Http gets from a browser (Firefox). However, when i use crontab / wget to issue the EXACT same url command, the index ends up with 0 (zero!) documents. This only happens when i do the delta-query via full import - the full import works fine. Since a full import can take longer than 24 hours, it is essential that i make delta updates to the index. How is it possible the whole index could be cleared from an import? The entire index is deleted?! How? Why? It makes no sense that the delta query via full import would work fine from Firefox and then delete the entire index if cron / wget makes the Get request.

Ideas?

Was it helpful?

Solution

Are you using clean=false ? Whenever command=full-import clean=true is default which means it will run a delete query in the beginning.

Also make sure you enter the complete update URL with params in quotes in your crontab. Else any param after & will be parsed out by shell, which could mean only the first param made it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top