Question

I know this question is asked multiple times in stackoverflow. I am posting this question to find out what will be the best choice in for my design. I have following schema for my job details.

_unique_key        varchar(256) NULL
_job_handle        varchar(256) NULL
_data              varchar(1024) NULL
_user_id           int(11) NULL
_server_ip         varchar(39) NULL
_app_version       varchar(256) NULL
_state             int(11) NULL
_is_set_stopped    bool

What operation we are doing on this table:

  1. For each job we will be having one update and 10 select query on this table. So we need high frequency for read and write.
  2. There are many application which are manipulating this table by doing filter on:
    1. _unique_key
    2. _state
    3. is_set_stopped
    4. _user_id
  3. _data field size varies from 5KB to 1 MB based on type of application and user.
  4. Application can update selective attribute.

Solution we thought:

MySQL InnoDB

I think MySQL will not scale enough due to requirement on high read and write.

MySQL In Memory Table

Problem with this solution is that

  1. It doesn't support dynamic field size. MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length. Source http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html
  2. select for .... update it will lock a entire table. I don't know will it be a problem.

Redis

Redis look likes a good choice. But I think my table is not good for key value cache server.

  1. It support only very let's set of datatypes. I can store only string in list. I need to store fields as JSON or some other format.
  2. If clients want to update a particular attribute they need to download full value and then do parsing of object and repush to server. May be I am wrong is there a way to do that?
  3. Filtering based on value will not be possible. May be I am wrong is there a way to do that?

MySQL InnoDB on TMPFS file system

This look promising. But don't no will it scale enough similar to Redis or MySQL in memory table.

Was it helpful?

Solution

In this question, you are confusing raw performance (i.e. efficiency) with scalability. They are different concepts.

Between the InnoDB and memory engines, InnoDB is likely to be the most scalable. InnoDB supports multi-versioning concurrency control, has plenty of optimizations to deal with contention, so it will handle concurrent accesses much better than the memory engine. Even if it may be slower in some I/O bound situations.

Redis is a single-threaded server. All the operations are serialized. It has zero scalability. It does not mean it is inefficient. On the contrary, it will likely support more connections that MySQL (due to its epoll-based event loop), and more traffic (due to its very efficient lock-free implementation and in-memory data structures).

To answer your question, I would give a try to MySQL with InnoDB. If it is properly configured (no synchronous commit, enough cache buffer, etc ...), it can sustain a good throughput. And instead of running it on top on tmpfs, I would consider SSD hardware.

Now, if you prefer to use Redis (which is not a relational store btw), you can certainly do it. There is no need to systematically serialize/deserialize your data. And filtering is indeed possible, provided you can anticipate all access paths and find an adapted data structure.

For instance:

  • one hash object per job. The key is _unique_key. The fields of the hash should correspond to the columns of your relational table.
  • one set per state value
  • 2 sets for is_set_stopped
  • one set per userid value

For each job insertion, you need to pipeline the following commands:

HMSET job:AAA job_handle BBB data CCC user_id DDD server_ip EEE app_version FFF state GGG is_set_stopped HHH
SADD state:GGG AAA
SADD is_set_stopped:HHH AAA
SADD user_id:DDD AAA

You can easily update any field individually provided you maintain the corresponding sets.

You can perform filtering queries by intersecting the sets. For instance:

SINTER is_set_stopped:HHH state:GGG

With Redis, the bottleneck will likely be the network, especially if the data field is large. I hope you will have more jobs of 5KB than jobs of 1MB. For instance 1000 write/s of 1 MB objects represents 8 GBits/s, probably more than what your network can sustain. This is true for both Redis and MySQL.

OTHER TIPS

I suggest postgresql, it's more capable (has more features and better support for complex queries and datatypes) than mysql and has a lot of tuning options.

If you give postgresql enough memory and tune the parameters right it will cache everything in memory.

Alternatively you could also use it on tmpfs if that's your preference and use streaming replication to a on-disk database for a hard copy.

Streaming replication has 3 operating modes asyncronously, on receive, and on fsync. If you use the first one, async, you don't have to wait for a sync to disk on the replication server so any updates will be very fast with tmpfs.

Since you also seem to have a lot of text fields, another feature might help, postgresql can store a textsearch vector on a row, and you can add an index on that and update it via a trigger with the concatenated content of all the rows you are searching on. That will give you an incredible boost in performance when doing text search on multiple columns versus any way you can possibly write that in mysql.

Regardless of database you use:

You state that _data is varchar[1024], yet you say it contains 5K to 1M of data? Is this actually a blob? Even if it was a mistake in the length mysql doesn't support varchar fields longer than 65535 bytes in length! I suppose that's not updated as much as the other rows, it might be wise to separate this into two tables, one with the static data and one with the dynamic data to minimize disk access.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top