Question

Good day. I'm doing my magister's degree on "Implementing distributed NoSQL database". Having studied the material comparing strenghts and weaknesses of NoSQL databases compared to RDBMS I've faced the step of choosing the proper problem to solve. My task is to show the typical development of the same application backed by Oracle and MongoDB and to show that during the evolution of the app Mongo begins to outperform Oracle. I'm focused on many writes and horizontal scaling. As the task I've chosen a typical Twitter like app with complex evolving domain, Java and Spring Data as my instruments.


I ask for experienced people for benefical critics and alternative tasks to show Mongo's favor. I understand that it highly depends on the schema, indexes, etc, still I ask if Mongo on my scenario can beat Oracle with:

  1. Many writes
  2. Horizontal scaling
  3. Read operations
  4. Schema evolving
  5. Sharding\Replication
Was it helpful?

Solution

My task is to show the typical development of the same application backed by Oracle and MongoDB and to show that during the evolution of the app Mongo begins to outperform Oracle.

I'm sorry for being very frank, but what kind of academic work starts with the final answer and you want to reverse engineer the problem?! This is less than worthless since it's intentionally misleading.

Leaving that aside, here are some tips:

  • Use something which requires JOINs in the relational database, but can be modelled as a single document. Blog posts would come to my mind. Common tricks include putting the author name into the document. No JOIN required for reading and if the author changes his name (which will happen very rarely in most systems) you only need a unique attribute like his email address to update the name everywhere:

    {
      title: "...",
      content: "...",
      date: "...",
      author: { name: "...", email: "..." },
      comments: [
        { name: "...", email: "...", text: "...", date: "..." },
        ...
      ]
    }
    
  • Keep your data small enough so they fit into RAM. MongoDB can make good use of that and will only occasionally flush information to disk (depending on your configuration), RDBMS will always to to disk for durability reasons (ACID compliance).

  • Use an "insecure" connection setting. Do not wait for the database to actually process the request, but return immediately (fire-and-forget like UDP). This isn't possible in a transactional system. You can amplify this if you test in the cloud, for example on EBS backed EC2 instances with have very high disk latency.
  • Use a pretty heavy ORM like Hibernate. Probably avoid an ODM (object document mapper) like Morphia (if you're doing it in Java) and use the plain Java driver - even though I'm not sure how big the performance gain is, but I'm sure there is some if done properly.
  • Use replication in MongoDB and allow reads from the secondaries (thus sacrificing consistency but gaining performance).
  • Use sharding.
  • Besides the system's performance, you might want to take a look at developer productivity. MongoDB is great for getting start and I have the feeling it is much quicker to get started with. Not sure if this doesn't change into the opposite in the long run - strict schemas do have their place in the long run IMHO.

I'd rather compare MySQL and MongoDB. The two are both open source software and pretty similar. For example indexing is exactly the same - only b-trees (if you stick to the standard, on-disk storage engines).

Final note: I hope you can agree with me that it's pretty easy to win such a disbalanced comparison, which makes it pretty pointless...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top