Question

The question wasn't clear enough, I think; here's an updated straight to the point question:

What are the common architectures used in building a meta search engine and is there any libraries available to build that type of search engine?

I'm looking at building an "enterprise" type of search engine where the indexed data could be coming from proprietary (like Autonomy or a Google Box) or public search engines (like Google Web or Yahoo Web).

Was it helpful?

Solution

If you look at Garlic (pdf), you'll notice that its architecture is generic enough and can be adapted to a meta-search engine.

UPDATE:

The rough architectural sketch is something like this:

   +---------------------------+
   |                           |
   |    Meta-Search Engine     |         +---------------+
   |                           |         |               |
   |   +-------------------+   |---------| Configuration |
   |   | Query Processor   |   |         |               |
   |   |                   |   |         +---------------+
   |   +-------------------+   |
   +-------------+-------------+
                 |
      +----------+---------------+
   +--+----------+-------------+ |
   |             |             | |
   |     +-------+-------+     | |
   |     |    Wrapper    |     | |
   |     |               |     | |
   |     +-------+-------+     | |
   |             |             | |
   |             |             | |
   |     +-------+--------+    | |
   |     |                |    | |
   |     | Search Engine  |    | |
   |     |                |    +-+
   |     +----------------+    |
   +---------------------------+

The parts depicted are:

  • Meta-Search Engine - the engine, orchestrates the whole thing.
  • Query Processor - part of the engine, resolves capabilities, sends requests and aggregates results of specific search engines (through the wrappers).
  • Wrapper - bridges the meta-search engine API to specific search engines. Each wrapper works with a specific search engine. Exposes the extenal search engine capabilities to the meta-search engine, accepts and responds to search requests.
  • Search engine - external search engines to query, they're exposed to the meta-search engine through the wrappers.
  • Configuration - data that configures the meta-search engine, e.g., which wrappers to use, where to find more wrappers, etc. Can also configure the wrappers.

OTHER TIPS

Have a look at Lucene.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Not exactly what you are looking for but I'd still suggest to check Compass, it might give you some ideas. And maybe also Hibernate Search.

Update: To clarify, Compass is not an ORM (neither Hibernate Search), it's a search oriented API and because it tries to abstract the underlying search engine (Lucene), I was suggesting to have a look at some structures it uses: Analyzers, Analyzer Filter, Query Parser, etc.

Building on top of Lucene, Compass simplifies common usage patterns of Lucene such as google-style search (...)

See also:

This page seems to list a few:

http://java-source.net/open-source/search-engines

I'd imagine the APIs will all be a similar in that they take a query string and some options, and return a collection of results. However, the exact types of the options and results are likely to be different, so I'd have thought that you'd need some sort of Adapter approach (for example) to unify access to the different backends.

If you can read Objective-C and want to see a working example of something like a "meta-search engine" you might want to take a look at the source code for Google's Vermilion framework. It use the engine that backs the very popular Google Quick Search Box utility for OS X (which in turn is a lot like QuickSilver.

The framework provides the capability to add plugin backends for the search process and deals with merge sorting the results from a number of sources etc. I would imagine the design for a federated search engine of any sort would follow a similar design.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top