Question

I am using Rethinkdb 1.10.1 with the official python driver. I have a table of tagged things which are associated to one user:

{
    "id": "PK",
    "user_id": "USER_PK",
    "tags": ["list", "of", "strings"],
    // Other fields...
}

I want to query by user_id and tag (say, to find all the things by user "tawmas" with tag "tag"). Starting with Rethinkdb 1.10 I can create a multi-index like this:

r.table('things').index_create('tags', multi=True).run(conn)

My query would then be:

res = (r.table('things')
       .get_all('TAG', index='tags')
       .filter(r.row['user_id'] == 'USER_PK').run(conn))

However, this query still needs to scan all the documents with the given tag, so I would like to create a compound index based on the user_id and tags fields. Such an index would allow me to query with:

res = r.table('things').get_all(['USER_PK', 'TAG'], index='user_tags').run(conn)

There is nothing in the documentation about compound multi-indexes. However, I tried to use a custom index function combining the requirements for compound indexes and multi-indexes by returning a list of ["USER_PK", "tag"] pairs.

My first attempt was in python:

r.table('things').index_create(
    'user_tags',
    lambda each: [[each['user_id'], tag] for tag in each['tags']],
    multi=True).run(conn)

This makes the python driver choke with a MemoryError trying to parse the index function (I guess list comprehensions aren't really supported by the driver).

So, I turned to my (admittedly, rusty) javascript and came up with this:

r.table('things').index_create(
    'user_tags',
    r.js(
        """(function (each) {
            var result = [];
            var user_id = each["user_id"];
            var tags = each["tags"];
            for (var i = 0; i < tags.length; i++) {
                result.push([user_id, tags[i]]);
            }
            return result;
        })
        """),
    multi=True).run(conn)

This is rejected by the server with a curious exception: rethinkdb.errors.RqlRuntimeError: Could not prove function deterministic. Index functions must be deterministic.

So, what is the correct way to define a compound multi-index? Or is it something which is not supported at this time?

Was it helpful?

Solution

Short answer:

List comprehensions don't work in ReQL functions. You need to use map instead like so:

r.table('things').index_create(
    'user_tags',
    lambda each: each["tags"].map(lambda tag: [each['user_id'], tag]),
    multi=True).run(conn)

Long answer

This is actually a somewhat subtle aspect of how RethinkDB drivers work. So the reason this doesn't work is that your python code doesn't actually see real copies of the each document. So in the expression:

lambda each: [[each['user_id'], tag] for tag in each['tags']]

each isn't ever bound to an actual document from your database, it's bound to a special python variable which represents the document. I'd actually try running the following just to demonstrate it:

q = r.table('things').index_create(
       'user_tags',
       lambda each: print(each)) #only works in python 3

And it will print out something like:

<RqlQuery instance: var_1 >

the driver only knows that this is a variable from the function, in particular it has no idea if each["tags"] is an array or what (it's actually just another very similar abstract object). So python doesn't know how to iterate over that field. Basically exactly the same problem exists in javascript.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top