Question

I am using PySolr to run my search. I want to index an rss feed and was wondering if this is possible using PySolr and if so how do you do it.

I have found instructions on how to do this in Solr at http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_Example

but can't find anything on how to do the equivalent in PySolr

Thanks

Was it helpful?

Solution

You probably don't need to do the equivalent in PySolr. If you already have Solr indexing the feed, as per the example, then you just use PySolr to query that index. Something like:

from pysolr import Solr
solr = Solr('http://localhost:8983/solr/rss/')
response = solr.search('some query string')
print response.hits
for result in response.docs:
    do_stuff_with(result)

If you really want to do it from the Python side, then you'll need to fetch and parse the RSS there (using other libraries, e.g. Universal Feed Parser); PySolr just wraps the interaction with Solr, it doesn't “do” data sources.

You may want to check out Haystack, which uses PySolr (and can use other engines) and neatly abstracts the job of creating search index entries and shipping them off to Solr for indexing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top