Pregunta

I'm using Solr 4.6 example's SimplePostTool to import documents from the filesystem to Solr. All it's ok, but the field last_modified is filled only when the original document has metadata for it. If the field is not present Solr extractor leaves the field blank.

I tried to modify SimplePostTool to set this field using the file system modification date, but then I get this error when I try to import files that already have last_modified field from the metadata:

430584 [qtp1214238505-16] ERROR org.apache.solr.core.SolrCore  – 
  org.apache.solr.common.SolrException: ERROR: 
  [doc=4861976] multiple values encountered for non multiValued field 
  last_modified: [2013-12-22T14:03:10.000Z, 2013-07-02T11:29:20.000Z]

I'm thinking about using a custom field for file system date, but in my case, metadata date if preferable when is available. Is there any way to merge them at import time?

Thanks!

¿Fue útil?

Solución 3

I finally solved the issue creating a custom Update Request Processor, as explained here: http://wiki.apache.org/solr/UpdateRequestProcessor

My processor is as follows:

package com.mycompany.solr;

import java.io.IOException;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse;
import org.apache.solr.update.AddUpdateCommand;
import org.apache.solr.update.processor.UpdateRequestProcessor;
import org.apache.solr.update.processor.UpdateRequestProcessorFactory;

public class LastModifiedMergeProcessorFactory 
   extends UpdateRequestProcessorFactory {

  @Override
  public UpdateRequestProcessor getInstance(SolrQueryRequest req, 
       SolrQueryResponse rsp, UpdateRequestProcessor next) {
    return new LastModifiedMergeProcessor(next);
  }
} 

class LastModifiedMergeProcessor extends UpdateRequestProcessor {

  public LastModifiedMergeProcessor(UpdateRequestProcessor next) {
    super(next);
  }

  @Override
  public void processAdd(AddUpdateCommand cmd) throws IOException {
    SolrInputDocument doc = cmd.getSolrInputDocument();

    Object metaDate = doc.getFieldValue( "last_modified" );
    Object fileDate = doc.getFieldValue( "file_date" );
    if( metaDate == null && fileDate != null) {
        doc.addField( "last_modified", fileDate );
    }

      // pass it up the chain
      super.processAdd(cmd);
    }   
  }

Where file_date is a field I set with the file modification date at import time.

Otros consejos

You can set a default value in your schema. Something like this should work:

<field name="my_date" type="date" indexed="true" stored="true" multiValued="false" default="NOW" />

Field Type Definition:

<fieldType name="date"     class="solr.TrieDateField" sortMissingLast="true" omitNorms="true"/>

while creating a document the solr takes all input as text and then validates according to the given data type , Hence any form of valid date format accepted ,would work fine with the solr . For current time Any default value

regards

Rajat

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top