Question

There is a table of n products, and a table of features of these products. Each product has many features. Given a Solr DataImportHandler configuration:

<document name="products">
  <entity name="item" query="select id, name from item">
    <field column="ID" name="id" />
    <field column="NAME" name="name" />
    <entity name="feature" 
      query="select feature_name, description from feature where item_id='${item.ID}'">
      <field name="feature_name" column="description" />
      <field name="description" column="description" />
    </entity>
  </entity>
</document>

Solr will run n + 1 queries to fetch this data. 1 for the main query, n for the queries to fetch the features. This is inefficient for large numbers of items. Is it possible to configure Solr such that it will run these queries separately and join them in-memory instead? All rows from both tables will be fetched.

Was it helpful?

Solution

This can be done using CachedSqlEntityProcessor:

<document name="products">
  <entity name="item" query="select id, name from item">
    <field column="ID" name="id" />
    <field column="NAME" name="name" />
    <entity name="feature" 
        query="select item_id, feature_name, description from feature"
        cacheKey="item_id"
        cacheLookup="item.ID" 
        processor="CachedSqlEntityProcessor">
      <field name="feature_name" column="description" />
      <field name="description" column="description" />
    </entity>
  </entity>
</document>

Since Solr's index is 'flat', feature_name and description are not connected in any way; each product will have multi-valued fields for each of these.

OTHER TIPS

I am not sure if Solr can do this, but the database can. Assuming that you are using MySQL, use JOIN and GROUP_CONCAT to convert this into a single query. The query should look something like this:

SELECT id, name, GROUP_CONCAT(description) AS desc FROM item INNER JOIN feature ON (feature.item_id = item.id) GROUP BY id

Don't forget to use the RegexTransformer on desc to separate out the multiple values.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top