Frage

I was wondering which of the following methods is faster and/or more efficient in terms of resource usage, given the following scenario:

You have a document with the following fields:

  1. Title (text)
  2. Description (text)
  3. Image (text - url to image source, or alternatively, could be a HTML field)

The Search API should search over the Title and Description fields, but NOT over the Image field. The Image field is only there for the template to be able to have a image source to render onto the search results page.

So the questions are:

  1. Is this approach correct?
  2. Does adding fields that are not really used for searching add overweight and consume extra resources?
  3. Is there a way of telling Search API to NOT search over a field?
  4. Would it be faster to use Search API to only retrieve the doc_ids and then get from datastore using those doc_ids?

Thanks!

War es hilfreich?

Lösung

1) You populate a document with some fields, then search over those fields. The approach is correct. Having a field with a URL linking to an image is something I also do.

2) Yes, in that they have to be stored, and you pay per byte of storage. But if you need them as they make up part of the data you want to then serve, then that's just the way it is.

3) Yes, you can search only over specified fields if you like. For example:

query_string = "product: piano"

That query would only search the field "product". That is all detailed here: https://developers.google.com/appengine/docs/python/search/#Python_Searching_for_documents_by_their_contents

4) You can determine timings etc using appstats: https://developers.google.com/appengine/docs/python/tools/appstats

But it seems to me that if you are getting document ID's only and then just getting all those documents anyway then that would be slower then just getting the entire document, as you are making more round trip queries to the database. If you don't get them all and just use the first that matches, or the documents are large then perhaps it might be slower. Who knows! Your use case is your use case.

Why not run some tests yourself by implementing various ways of doing it, then seeing which works best for your use case? Appstats will help with that.

If you want to store other data in the datastore that the searchable documents link to which is retrieved after getting that initial document (e.g. you are storing datastore keys in the document) then everything comes with a cost, in speed or $$. Both are valid options, depending on your (again!) use case.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top