Question

Search engines (or similar web services) use flat file and nosql databases. The structure of an Inverted Index is simpler than many-to-many relationship, but it should be more efficient to handle it with the latter one. There should be two tables for few billions of webpages and millions of keywords. I have tested for a table of 50 million row; the speed of mysql can be comparable with that of BerkeleyDB.

I think the problem of working with large mysql database appears when dealing with something like ALTER TABLE (which is not a case here). This performance is read-intensive in which mysql is quite good. When reading a row by SELECT I did not find a singificant difference between a table with few rows or few million rows; does it different when having billions of row?

NOTE: I do not mean Google or Bing (or advanced features like full-text search), I am discussing the concept.

Était-ce utile?

La solution

AFAIK, nosql provides flexibility which no other regular relational database engine offers. I don't know which search engines use which database engine, but I could think of several benefits of using nosql (not flat files. Have no idea why one would use them for complex applications).

Now if you're just matching criteria and giving out results without a particular order - you're fine with any relational database. But once you want to provide the most relevant results, there are tons of criteria to take into account. You could:

  • Give priority to results which have similar content as previously chosen results by the user.
  • Enumerate the results which are more relevant to the person based on location, language, other known facts.
  • Enumerate more popular results first (again, most popular within a particular region/age group/occupation group, or other groups based on known facts about the user).

These are only the basic sorting criteria, the ones that came to mind. When one starts developing and maintaining, hundreds of other criteria will come to mind and will have the possibility to be implemented. Now think about how each one would be implemented. There could be thousands of fields characterizing each resource, and each new feature will need additional data.

You could do that with EAV pattern in the relational database, which will give you some flexibility, or you could use NoSQL, which is built exactly for such purposes.

Again, this is just a reason to use NoSQL. I know many more reasons to use RDBMS.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top