Question

I have a table with an integer key (timestamp) contains the time when particular record should be removed from a database. There is also a cleaning query, which takes from this table records with expiration time less then now and removes them.

Erlang documentation says, that there are four types of table types: set, ordered_set, bag, and duplicate_bag.

  • set is implemented using hash tables, so reading takes O(1) time complexity.
  • ordered_set is implemented using tree, so reading takes O(log(n)) time complexity, but it better works with consequent intervals.
  • I found no information about bag implementation.

ordered_set seems ideal, but I can't use it because two records can have the same timestamp. So the question is:

How the bag table is implemented and is it good with querying consequent intervals? If not, how can I get "ordered_bag" functionality?

Was it helpful?

Solution

Mnesia's bag is implemented using ETS and DETS, so as other table types [1]. Also, Mnesia does not support duplicate_bag tables - you can see it from ducumentation [2]. Thus, we can conclude bag in Mnesia is implemented as hash table and has constant lookup time, since ETS and DETS bag is implemented as hash table [3]. [4] also says that set and bag are implemented as hash tables in Mnesia.

  1. Learn You Some Erlang
  2. Erlang -- mnesia:create_table/2
  3. Erlang Programming by Fransecso Cesarini and Simon Thompson, Ch.10
  4. Erlang and OTP in Action by Martin Logan, Eric Merritt, and Richard Carlsson, Ch.9

On the rest of the question:

No, bag is not good with querying consequent intervals. To get an interval from bag table you must fully traverse it. I see two possible decisions to that.

First, you can use additional ordered_set table to keep order, as @niahoo suggested. Thus, you will be able to efficiently query all timestamps that fall in an interval, and then delete corresponding entries from your bag table, which also will be efficient, since you will know all keys by this point.

Second, you can use ordered_set of {timestamp, [values]}. This will require additional manual job on inserting and deleting single entry, but it will save you from creating additional table if you only need to query them grouped by timestamp.

OTHER TIPS

I think that you should first think about the most frequent and time critical requests you have to perform with your database to choose the right organization and primary key, I assume (but may be wrong) that it is not the timestamp, nor the cleanup function.

If I am correct, you could simply traverse the table with dirty_first() and then dirty_next() function in order to have perturbations as short as possible (I think that dirty functions are ok since there is no risk that the timestamp is modified . during the operation, and anyway if you don't cleanup an entry you'll do it at next iteration).

Last if time for cleanup is really critical, but timestamp not the most important key for your application, you can store your data in a set using the best key, and in a separate ordered set table the timestamps (primary key) with a list of associated keys.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top