Question

I am working on a architecture design of a application using PHP Yii which has large records(Around a million in future). The DB struct is as below:

enter image description here

Requirement:

  1. Fast Keyword Search for Profiles,Articles,Forums. Keyword can be combination of columns-e.g. BizName+City,City+Speciality,ServiceName+City,Article Title etc.
  2. Keyword suggestion to user
  3. Show search results in tabs. Example: Profiles,Articles.Forums etc.

Approach 1:

  1. Have a relational DB.Write SQLs on multiple columns using OR and pattern matching.

Cons:

Poor performance

Aprroach 2:

  1. Create a Keyword table.Create the combination of columns which are searchable and save them in KeywrodTab.
  2. Create mapping tables of -keyword-Profile.Keyword-Article,Keyword-Forum etc.
  3. Query keyword table for autosuggestions. once user hits search button query mapping tables and extract articleId,ProfileId,ForumId etc.

Cons:

Creating/Updating keywords and mapping on every update.

Approach 3:

  1. Have a relational DB with FULLTEXT indices on searchable columns.

Questions:

  1. Not sure if auto suggest for search box will work or not?
  2. How will be the performance in this case as compared other approaches?

Approach 4:

Use NoSQL DB like MongoDB/Solr/Lucene in combination with RelationalDB.Use noSQL for finding the articleId,ProfileId,ForumId etc.And relational DB for displaying results.

Cons:

  1. Creating/Updating noSQL on every update.

Any other approaches please? Which approach is scalable and will give good performance?

Was it helpful?

Solution

When you want to search quickly by multiple columns in multiple tables in an SQL database, you would need to place indexes on almost everything. That's a good way to get the write-performance of your relational database to record-lows.

For that reason I would recommend you to use an independent system for searching. From the technologies you mentioned I would rather recommend the dedicated search server Apache Solr (which is part of the Lucene project, not a separate technology) than MongoDB, because MongoDB is an interesting database technology a lots of great features, but its text search is not a core feature and rather tagged-on.

But technology-choices are always subjective, so evaluate all the options, see how they line up with your specific requirements and make your own decision.

OTHER TIPS

If you put it like that, approach 4 is the most scalable and has the best performance hands down.

However, as it's not clear what the content will actually be and how large the dataset will be - 'around a million rows' is hardly an indication, as it doesn't say what the rows contain and if those rows are in a single table or not - it's actually not possible to give accurate advice. Approach 4 may be the fastest anyway, but is it the most efficient? A million rows in a single table with about 4 columns, each containing about 250 bytes of data (just a guess here, your miles may vary), is actually not all that much. Choose the indexes well and optimize the queries, and a RDBMS may be all you need.

My suggestion is: build up a dataset to test with and try the various approaches.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top