Frage

Does anyone know whether ArangoDB supports faceted search and how performance compares to other products that support it well (e.g., Solr, MarkLogic) or those that don't (e.g., Mongo)?

After searching the site, reading the docs, and searching the Google group, I don't see it discussed anywhere.

Thanks

War es hilfreich?

Lösung

ArangoDB has a query language, which supports group-by like queries. That allows you to implement a faceted search. To be certain we have the same understanding of faceted searching, just let me explain, what I think is meant by it. You have a - for example - a list of products. Each product has some attributes (e.g. name, model) and some categories (e.g. manufacturer). I can then search for a name or a name containing a word. This will list all products plus an indication how many products are in which category. Is that what you meant?

So for examples: Assume you have documents which have three attributes (name, attribute1, attribute2) and two categories (category1, category2):

> for (i = 0; i < 10000; i++) db.products.save({category1: i % 5, category2: i % 7, attribute1: i % 13, attribute2: i % 17, name: "Lore Ipsum " + i, productId: i})

so a typical document is:

> db.products.any()
{
  "_id" : "products/8788564659",
  "_rev" : "8788564659",
  "_key" : "8788564659",
  "productId" : 9291,
  "category1" : 1,
  "category2" : 2,
  "attribute1" : 9,
  "attribute2" : 9,
  "name" : "Lore Ipsum 9291"
}

If you want to search for all documents that have attribute1 between 2 and 3 (inclusive), you could use

> db._query("FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 SORT p.name LIMIT 3 RETURN p").toArray();
[
  {
    "_id" : "products/7159077555",
    "_rev" : "7159077555",
    "_key" : "7159077555",
    "productId" : 1003,
    "category1" : 3,
    "category2" : 2,
    "attribute1" : 2,
    "attribute2" : 0,
    "name" : "Lore Ipsum 1003"
  },
  {
    "_id" : "products/7159274163",
    "_rev" : "7159274163",
    "_key" : "7159274163",
    "productId" : 1004,
    "category1" : 4,
    "category2" : 3,
    "attribute1" : 3,
    "attribute2" : 1,
    "name" : "Lore Ipsum 1004"
  },
  {
    "_id" : "products/7161633459",
    "_rev" : "7161633459",
    "_key" : "7161633459",
    "productId" : 1016,
    "category1" : 1,
    "category2" : 1,
    "attribute1" : 2,
    "attribute2" : 13,
    "name" : "Lore Ipsum 1016"
  }
]

or if you are only interested in the product identifies

> db._query("FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 SORT p.name LIMIT 3 RETURN p.productId").toArray();
[
  1003,
  1004,
  1016
]

Now to get the facets say for category1

>  db._query("LET l = (FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 SORT p.name RETURN p) return [ slice(l,@skip,@count), (FOR p in l collect c1 = p.category1 INTO g return { category1: c1, count: length(g[*].p)}) ]", { skip: 0, count: 3 }).toArray()
[
  [
    [
      {
        "_id" : "products/7159077555",
        "_rev" : "7159077555",
        "_key" : "7159077555",
        "productId" : 1003,
        "category1" : 3,
        "category2" : 2,
        "attribute1" : 2,
        "attribute2" : 0,
        "name" : "Lore Ipsum 1003"
      },
      {
        "_id" : "products/7159274163",
        "_rev" : "7159274163",
        "_key" : "7159274163",
        "productId" : 1004,
        "category1" : 4,
        "category2" : 3,
        "attribute1" : 3,
        "attribute2" : 1,
        "name" : "Lore Ipsum 1004"
      },
      {
        "_id" : "products/7161633459",
        "_rev" : "7161633459",
        "_key" : "7161633459",
        "productId" : 1016,
        "category1" : 1,
        "category2" : 1,
        "attribute1" : 2,
        "attribute2" : 13,
        "name" : "Lore Ipsum 1016"
      }
    ],
    [
      {
        "category1" : 0,
        "count" : 307
      },
      {
        "category1" : 1,
        "count" : 308
      },
      {
        "category1" : 2,
        "count" : 308
      },
      {
        "category1" : 3,
        "count" : 308
      },
      {
        "category1" : 4,
        "count" : 308
      }
    ]
  ]
]

To drill down to category1 and use the facets for category2:

>  db._query("LET l = (FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 && p.category1 == 1 SORT p.name RETURN p) return [ slice(l,@skip,@count), (FOR p in l collect c2 = p.category2 INTO g return { category2: c2, count: length(g[*].p)}) ]", { skip: 0, count: 3 }).toArray()
[
  [
    [
      {
        "_id" : "products/7161633459",
        "_rev" : "7161633459",
        "_key" : "7161633459",
        "productId" : 1016,
        "category1" : 1,
        "category2" : 1,
        "attribute1" : 2,
        "attribute2" : 13,
        "name" : "Lore Ipsum 1016"
      },
      {
        "_id" : "products/7169497779",
        "_rev" : "7169497779",
        "_key" : "7169497779",
        "productId" : 1056,
        "category1" : 1,
        "category2" : 6,
        "attribute1" : 3,
        "attribute2" : 2,
        "name" : "Lore Ipsum 1056"
      },
      {
        "_id" : "products/6982720179",
        "_rev" : "6982720179",
        "_key" : "6982720179",
        "productId" : 106,
        "category1" : 1,
        "category2" : 1,
        "attribute1" : 2,
        "attribute2" : 4,
        "name" : "Lore Ipsum 106"
      }
    ],
    [
      {
        "category2" : 0,
        "count" : 44
      },
      {
        "category2" : 1,
        "count" : 44
      },
      {
        "category2" : 2,
        "count" : 44
      },
      {
        "category2" : 3,
        "count" : 44
      },
      {
        "category2" : 4,
        "count" : 44
      },
      {
        "category2" : 5,
        "count" : 44
      },
      {
        "category2" : 6,
        "count" : 44
      }
    ]
  ]
]

In order to make that search string more user friendly, it be necessary to write some small helper functions in Javascript. I think the support group https://groups.google.com/forum/#!forum/arangodb would be they right place to discuss your requirements.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top