Question

I'm using Solr 4.6 example's SimplePostTool to import documents from the filesystem to Solr. All it's ok, but the field last_modified is filled only when the original document has metadata for it. If the field is not present Solr extractor leaves the field blank.

I tried to modify SimplePostTool to set this field using the file system modification date, but then I get this error when I try to import files that already have last_modified field from the metadata:

430584 [qtp1214238505-16] ERROR org.apache.solr.core.SolrCore  – 
  org.apache.solr.common.SolrException: ERROR: 
  [doc=4861976] multiple values encountered for non multiValued field 
  last_modified: [2013-12-22T14:03:10.000Z, 2013-07-02T11:29:20.000Z]

I'm thinking about using a custom field for file system date, but in my case, metadata date if preferable when is available. Is there any way to merge them at import time?

Thanks!

Était-ce utile?

La solution 3

I finally solved the issue creating a custom Update Request Processor, as explained here: http://wiki.apache.org/solr/UpdateRequestProcessor

My processor is as follows:

package com.mycompany.solr;

import java.io.IOException;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse;
import org.apache.solr.update.AddUpdateCommand;
import org.apache.solr.update.processor.UpdateRequestProcessor;
import org.apache.solr.update.processor.UpdateRequestProcessorFactory;

public class LastModifiedMergeProcessorFactory 
   extends UpdateRequestProcessorFactory {

  @Override
  public UpdateRequestProcessor getInstance(SolrQueryRequest req, 
       SolrQueryResponse rsp, UpdateRequestProcessor next) {
    return new LastModifiedMergeProcessor(next);
  }
} 

class LastModifiedMergeProcessor extends UpdateRequestProcessor {

  public LastModifiedMergeProcessor(UpdateRequestProcessor next) {
    super(next);
  }

  @Override
  public void processAdd(AddUpdateCommand cmd) throws IOException {
    SolrInputDocument doc = cmd.getSolrInputDocument();

    Object metaDate = doc.getFieldValue( "last_modified" );
    Object fileDate = doc.getFieldValue( "file_date" );
    if( metaDate == null && fileDate != null) {
        doc.addField( "last_modified", fileDate );
    }

      // pass it up the chain
      super.processAdd(cmd);
    }   
  }

Where file_date is a field I set with the file modification date at import time.

Autres conseils

You can set a default value in your schema. Something like this should work:

<field name="my_date" type="date" indexed="true" stored="true" multiValued="false" default="NOW" />

Field Type Definition:

<fieldType name="date"     class="solr.TrieDateField" sortMissingLast="true" omitNorms="true"/>

while creating a document the solr takes all input as text and then validates according to the given data type , Hence any form of valid date format accepted ,would work fine with the solr . For current time Any default value

regards

Rajat

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top