Document-oriented dbms as primary db and a RDBMS db as secondary db?

https://stackoverflow.com/questions/7745677

09-02-2021
|

Question

I'm having some performance issues with MySQL database due to it's normalization.

Most of my applications that uses a database needs to do some heavy nested queries, which in my case takes a lot of time. Queries can take up 2 seconds to run, with indexes. Without indexes about 45 seconds.

A solution I came a cross a few month back was to use a faster more linear document based database, in my case Solr, as a primary database. As soon as something was changed in the MySQL database, Solr was notified.

This worked really great. All queries using the Solr database only took about 3ms.

The numbers looks good, but I'm having some problems.

Huge database

The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.

Difficult to render both a Solr object and a Active Record (MySQL) object without getting wet.

The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.

Like this.

# Controller
@song = Song.first

# View
@song.artist.urls.first.service.name

The problem in my case is that the data being returned from Solr is flat like this.

{
  id: 123,
  song: "Waterloo",
  artist: "ABBA",
  service_name: "Groveshark",
  urls: ["url1", "url2", "url3"]
}

This forces me to build an active record object that can be passed to the view.

My question

Is there a better way to solve the problem? Some kind of super duper fast primary read only database that can handle complex queries fast would be nice.

Solution

Solr individual fields update

About reindexing all on schema change: Solr does not support updating individual fields yet, but there is a JIRA issue about this that's still unresolved. However, how many times do you change schema?

MongoDB

If you can live without a RDBMS (without joins, schema, transactions, foreign key constrains), a document-based DB like MongoDB, or CouchDB would be a perfect fit. (here is a good comparison between them )

Why use MongoBD:

data is in native format (you can use an ORM mapper like Mongoid directly in the views, so you don't need to adapt your records as you do with Solr)
dynamic queries
very good performance on non-full text search queries
schema-less (no need for migrations)
build-in, easy to setup replication

Why use SOLR:

advanced, very performant full-text search

Why use MySQL

joins, constrains, transactions

Solutions

So, the solutions (combinations) would be:

Use MongoDB + Solr
- but you would still need to reindex all on schema change
Use only MongoDB
- but drop support for advanced full-text search
Use MySQL in a master-slave configuration, and balance reads from slave(s) (using a plugin like octupus) + Solr
- setup complexity
Keep current setup, denormalize data in MySQL
- messy

Solr reindexing slowness

The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.

Reindexing 200MB DB in Solr SHOULD NOT take 12 hours! Most probably you have also other issues like:

MySQL:

n+1 issue
indexes

SOLR:

commit after each request - this is the default setup is you use a plugin like sunspot, but it's a perf killer for production

From http://outoftime.github.com/pivotal-sunspot-presentation.html:

By default, Sunspot::Rails commits at the end of every request that updates the Solr index. Turn that off.

Use Solr's autoCommit functionality. That's configured in solr/conf/solrconfig.xml

Be glad for assumed inconsistency. Don't use search where results need to be up-to-the-second.

other setup issues (http://wiki.apache.org/solr/SolrPerformanceFactors#Indexing_Performance)

Look at the logs for more details

OTHER TIPS

Instead of pushing your data into Solr to flatten the records, why don't you just create a separate table in your MySQL database that is optimized for read only access.

Also you seem to contradict yourself

The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.

The problem in my case is that the data being returned from Solr is flat... This forces me to build a fake active record object that can be rendered by the view.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow