Question

Mnesia has four methods of reading from database: read, match_object, select, qlc. Besides their dirty counterparts of course. Each of them is more expressive than previous ones.

  1. Which of them use indices?
  2. Given the query in one of this methods will the same queries in more expressive methods be less efficient by time/memory usage? How much?

UPD. As I GIVE CRAP ANSWERS mentioned, read is just a key-value lookup, but after a while of exploration I found also functions index_read and index_write, which work in the same manner but use indices instead of primary key.

Was it helpful?

Solution

One at a time, though from memory:

  • read always uses a Key-lookup on the keypos. It is basically the key-value lookup.
  • match_object and select will optimize the query if it can on the keypos key. That is, it only uses that key for optimization. It never utilizes further index types.
  • qlc has a query-compiler and will attempt to use additional indexes if possible, but it all depends on the query planner and if it triggers. erl -man qlc has the details and you can ask it to output its plan.

Mnesia tables are basically key-value maps from terms to terms. Usually, this means that if the key part is something the query can latch onto and use, then it is used. Otherwise, you will be looking at a full-table scan. This may be expensive, but do note that the scan is in-memory and thus usually fairly fast.

Also, take note of the table type: set is a hash-table and can't utilize a partial key match. ordered_set is a tree and can do a partial match:

Example - if we have a key {Id, Timestamp}, querying on {Id, '_'} as the key is reasonably fast on an ordered_set because the lexicographic ordering means we can utilize the tree for a fast walk. This is equivalent of specifying a composite INDEX/PRIMARY KEY in a traditional RDBMS.

If you can arrange data such that you can do simple queries without additional indexes, then that representation is preferred. Also note that additional indexes are implemented as bags, so if you have many matches for an index, then it is very inefficient. In other words, you should probably not index on a position in the tuples where there are few distinct values. It is better to index on things with many different (mostly) distinct values, like an e-mail address for a user-column for instance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top