Question

I'm using haystack with whoosh as backend for a Django app.

Is there any way to view the content (in a easy to read format) of the indexes generated by whoosh? I'd like to see what data was indexed and how so I can better understand how it works.

Was it helpful?

Solution

You can do this pretty easily from python's interactive console:

>>> from whoosh.index import open_dir
>>> ix = open_dir('whoosh_index')
>>> ix.schema
<<< <Schema: ['author', 'author_exact', 'content', 'django_ct', 'django_id', 'id', 'lexer', 'lexer_exact', 'published', 'published_exact']>

You can perform search queries directly on your index and do all sorts of fun stuff. To get every document I could do this:

>>> from whoosh.query import Every
>>> results = ix.searcher().search(Every('content'))

If you wanted to print it all out (for viewing or whatnot), you could do so pretty easily using a python script.

for result in results:
    print "Rank: %s Id: %s Author: %s" % (result.rank, result['id'], result['author'])
    print "Content:"
    print result['content']

You could also return the documents directly from whoosh in a django view (for pretty formatting using django's template system perhaps): Refer to the whoosh documentation for more info: http://packages.python.org/Whoosh/index.html.

OTHER TIPS

from whoosh.index import open_dir
ix = open_dir('whoosh_index')
ix.searcher().documents()  # will show all documents in the index.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top