Question

I have database table containing ~30 GB of data. I am indexing it with DIH. Indexing data takes only 1 hr 15 minutes but search is very slow it takes around 1 minute which doesn't seem to be right. Please help, if someone has faced the same issue.

I am proving the content of files.

data-config.xml

<dataConfig>
  <dataSource type="JdbcDataSource" 
              driver="com.mysql.jdbc.Driver"
               url="jdbc:mysql://Battrdbtest20/test_results"
              batchSize="-1"
              user="results" 
              password="resultsloader"/>
   <document>
    <entity name="Syndrome" 
        pk="test_file_result_id"
      query="SELECT * FROM Syndrome">  

        <Field column="test_file_result_id" name="test_file_result_id"/>
        <Field column="syndrome" name="syndrome"/>
    </entity>
  </document>
</dataConfig>

schema.xml (Changed only fields to suit my data)

 <fields>

     <field name="test_file_result_id" type="slong" indexed="true" stored="true" required="true" omitNorms="true" multivalued="false" />
     <field name="syndrome" type="string" indexed="true" stored="true" required="true" omitNorms="false" multivalued="false" />

 </fields>

 <uniqueKey>test_file_result_id</uniqueKey>

 <defaultSearchField>syndrome</defaultSearchField>

NO CHANGE IN solrconfig.xml

test_file_result_id is id of 10 digits. And syndrome field stores blob which contain huge data )kind of log file content).

I would like to mention that when i search by test_file_result_id, search results comes up within a second but for syndrome, it take more than a minute.

Thanks in advance!!

Was it helpful?

Solution

I am assuming that string is defined as solr.StrField in your schema.xml.

Since you are having a blob of data, it would possibly be useful to use a field type that has the right set of tokenizers, analyzers and filters.

For example, adding a StandardTokenizerFactory keeps tokens to a meaningful value set.

An example of the fieldtype definition:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldtype>

You could try something like this and that should make a difference to the response time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top