Question

For a web application I'm developing, I need to store a large number of records. Each record will consist of a primary key and a single (short-ish) string value. I expect to have about 100GB storage available and would like to be able to use it all.

The records will be inserted, deleted and read frequently and I must use a MySQL database. Data integrity is not crucial, but performance is. What issues and pitfalls am I likely to encounter and which storage engine would be best suited to the task?

Many thanks, J

Was it helpful?

Solution

Whatever solution you use, since you say your database will be write-heavy you need to make sure the whole table doesn't get locked on writes. This rules out MyISAM, which some have suggested. MyISAM will lock the table on an update,delete or insert. That means any client who wants to read from the table will have to wait for the write to finish. Dunno what the INSERT LOW PRIORITY does though, probably some hack around table-locking :-)

If you simply must use MySQL, you'll want InnoDB, which doesn't lock on write. I dunno how MySQL does VACUUM's InnoDB tables (InnoDB is MVCC like PostgreSQL and so needs to clean up)... but you'll have to take that into consideration if you are doing a lot of updates or deletes.

OTHER TIPS

It all depends the read/write pattern your application is generating, and the level of accuracy you want to get. For exemple, if you don't really care having all the last inserted rows immediately available, consider using INSERT LOW PRIORITY can help SELECTs. If the text size is relatively small, you may use a fixed CHAR type which will help indexing a lot and reduce time of SELECTs If your application generates a lot of updates, you'll prefer InnoDB storage engine which allows to lock only one row when updating (vs all the table on myISAM). On the other hand, its more CPU intensive, so if you don't use transactions and that your update pattern is relatively small, consider using myISAM

If you are using indexing (and even if you're not) you may encounter scaling issues. You can try partitioning to try to reduce those effects.

In my own project, integrity is not crucial but performance is as well. What we did was relax all the transactional requirements, relax disk synchronization requirements, and commit batch inserts and we really improved our write speeds.

Also, make sure you do your own testing to tune your memory sizes. I believe MySQL has a few different types of caches of which you can configure the size.

You definitely want to use MyISAM for the storage engine. But you say you expect 100 GB and it will only contain a short-ish string value. You definitely want to use a 64-bit int for your identity/primary key.

But my real question is. Are you using this to store session information from the web site? If so you want want to use memcache instead of MySQL.

large MySQL queries make my Quad Core/8GB Ram DB Server crash...

solution is to use PostgresSQL (SQL Server if you can afford it)

You are much better off if the "shortish string" is in a fixed length column so that the table has fixed length rows. MySQL with MyISAM will operate quite efficiently for you then. Allocate as much memory as you can for the Key Buffer so that much of the index in memory. Your goal should be a single random access to the disk to retrieve one row -- you can't do better than that given 100GB of data and 8GB of memory. You should not expect to achieve more than a few hundred such queries per second, because that's all the random accesses a disk can do.

You might be interested in my MySQL custom storage engine (described here). It manages memory differently from MyISAM, although the profile of your application isn't exactly what my engine was optimized for.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top