ElasticSearch not returning any results for common query strings (works with less common strings)

StackOverflow https://stackoverflow.com/questions/13704939

  •  04-12-2021
  •  | 
  •  

Question

I am doing some testing with ElasticSearch, and I am finding that it does not return results for extremely common terms. I assume this may be because it's timing out, running out of memory, or something related, but I'm confused as to why I'm not getting any sort of error feedback.

This is the code snippet:

// client & index ----------------------------------------------
$eC = new Elastica_Client();
$eI = $eC->getIndex('test_index');


// query string ---------------------------------------
$eQqs = new Elastica_Query_QueryString();
$eQqs->setDefaultOperator('AND');
$eQqs->setQuery('the'); ### <--- example of a common keyword,
// --- note that if I were to use something less common like "zoo"
// that it would return an expected result set


// search object --------------------------------------
$eQ = new Elastica_Query();
$eQ->setQuery( $eQqs );
$eQ->setFrom(1);
$eQ->setLimit(3);


// get result set -------------------------------------
$eRS = $eI->search( $eQ );


// output results ----------------------------------------
echo "total time: " . $eRS->getTotalTime() . "\n";
echo "total results: " . $eRS->getTotalHits() . "\n\n";

foreach( $eRS->getResults() as $result ) {
    print_r( $result->getData() );
}

As mentioned in the comment, if I search for a less common string, then it works fine, and I'll get something like this:

total time: 292
total results: 21

Array
(
    [id] => 1234
    [name] => A day at the Zoo
...

However, if I search for something very common, like "the", I get nothing from $eRS->getResults(), but rather I get this:

total time: 2
total results: 0

Just to note, I have confirmed that indeed there are numerous instances of "the" in the index. What is going on here? Am I doing this incorrectly? If not, how can I get it to spit out some meaningful errors instead of this apparently unexpected result set?

Was it helpful?

Solution

Common words like "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with" are so called stop words. These words are very common and it's typically considered that they don't add value to full text search. Therefore they are not indexed and ignored during search by default. You can change the list of words or disable them completely by using non-default analyzer for your index.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top