Question

Consider a Solr index with the following fields:

<fields>
    <field name="id" type="uuid" indexed="true" stored="true" default="0"/>
    <field name="user" stored="true" type="string" multiValued="false" indexed="true"/>
    <field name="text" stored="true" type="textmulti" multiValued="false" indexed="true"/>
    <field name="media" stored="true" type="string" multiValued="false" indexed="true"/>
</fields>

I would consider a newly indexed Document to be a dupe (and therefore to be rejected) if there exists a current document that has identical user and text fields, no matter what the id or media fields' content are. Documents that have matching user or text is not enough to be considered a dupe, it must be both user and text.

I have read through Document Duplication Detection and XML Messages for Updating a Solr Index on the Solr wiki but I still do not see how to configure this. Any ideas? I am using the wonderful solr-php-client to connect to Solr via PHP.

Thanks.

Was it helpful?

Solution

probably you have some reason not to do so, but you could use the concatenation of user and text as id and then you would not need to use Duplicate Detection as Solr does it for you if you dont overwrite

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top