I'm using the DataImportHandler from Solr to index certain data from a database. However, the database table scheme uses CHAR
-fields, so they have a fixed width and have some trailing spaces.
I'm trying to remove these trailing spaces (trimming them) by using the solr.TrimFilterFactory
.
In my Solr schema.xml
I'm using the following field type to index the data:
<fieldType name="string" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.TrimFilterFactory" updateOffsets="true" />
</analyzer>
</fieldType>
So now I'm adding a document like:
<add>
<doc>
<field name="test">Test </field>
</doc>
</add>
And I'm expecting that the trailing spaces from the test-field are removed, but when I query for: test:Test*
, I get:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="test">Test </str>
</doc>
</result>
</response>
So as you can see, the trailing spaces are not removed. I must be doing something wrong or misunderstood the concept of filters. But my expectation was that the query would return:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="test">Test</str>
</doc>
</result>
</response>
So my question is how I can make sure that when indexing these documents, all trailing spaces get removed.