Question

I am building a search engine for the list of articles I have. I was advised by a lot of people to use elastic search for full text search. I wrote the following code. It works. But I have a few issues.

1) If the same article is added twice - that is indexdoc is run twice for the same article, it accepts it and adds the article twice. Is there a way to have a "unique key" in the search index.

2) How can I change the scoring / ranking function? I want to give more importance to title?

3) Is this the correct way to do it anyways?

4) How do I show related results - if there is a spelling mistake?

from elasticsearch import Elasticsearch
from crsq.models import ArticleInfo

es = Elasticsearch()

def indexdoc(articledict):

        doc = {
                'text': articledict['articlecontent'],
                'title' : articledict['articletitle'],
                'url': articledict['url']
        }

        res = es.index(index="article-index", doc_type='article', body=doc)


def searchdoc(keywordstr):
        res = es.search(index="article-index", body={"query": {"query_string": {"query": keywordstr}}})
        print("Got %d Hits:" % res['hits']['total'])
        for hit in res['hits']['hits']:
            print("%(url)s: %(text)s" % hit["_source"])

def indexurl(url):

        articledict = ArticleInfo.objects.filter(url=url).values()
        if len(articledict):
                indexdoc(articledict)
        return
Was it helpful?

Solution

1) You have to specify an id for you document. You have to add the parameter id when you are indexing

res = es.index(index="article-index", doc_type='article', body=doc, id="some_unique_id")

2) There is more than one way to do this, but for example you can boost title by changing a bit your query:

{"query": {"query_string": {"query": keywordstr, "fields" : ["text", "title^2"]}}

With this change title will have the double of importance that field text

3) As a proof of concept is not bad.

4) This is a big topic, I think you should check the documentation of suggesters

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top