Domanda

The feature works the following way:

  • Website has users and users can have any number of their searches saved (e.g. Jobs in NY, PHP jobs, etc). There are a lot of parameters involved so this is virtually impossible to index (I am using MySQL).
  • Every day a number of new jobs get posted to the website
  • Every 24 hours we take the jobs posted within the last 24 hours and match them up against the existing job searches and then email users about matching jobs.

The problem here is that it is a high-traffic website and even for an optimistic case (few new jobs posted), it takes 10 minutes to run this search query. Are there any classical solutions for this problem? We've been using Sphinx for search-intensive places but I can't apply it here because Sphinx won't return all results, it will cut them off eventually. For now the best thing I could come with is to have search.matched_job_ids column and then whenever a job is posted, match it against all existing searches and record the job id in the matched_job_ids column of matched searches. At the end of the day we will email users and truncate the column. This technically doesn't offer any performance improvement but spreads the load over time by executing many smaller search queries rather than one big query. Are there any better approaches?

È stato utile?

Soluzione

Each job can be described with the number of parameters - job sphere, job name, salary and so on. Each parameter has set of predefined values -

  1. Job sphere - IT,medicine,industry...
  2. Job name - programmer, tester, driver...
  3. 10-50 thousands per month, 50-100...
  4. Flexy time, full time, freelance...

Let's encode saved search. Maximal number of values among all parameters (I believe it is job name) is the base of numeral system. Number of parameters - number of digits.

BIGINT = 2^64-1 = 18 446 744 073 709 551 616 = 20 digits. In normal 10-base system you can describe 20-1 (first digit is fixed) = 19 parameters each having 10 values. As 10 values is not enough for describing such parameter as job name you should use 30-60-base system. Of course, it leads to decreasing total number of parameters, but I thing it's possible to describe some job with 12-15 parameters.

Create table savedSearches(code,mail) which indexed on (code,mail). Index type - primary key.

New job posted:

1) Encode it programatically.
2) select mail from savedSearhes where code=calculatedCode. Mail is in covered index - select sholuld fast enough.
3) Send new job to selected mails.

Important note - one parameter - host company of posted job can have too much values. I think you should store it separately, not in savedSearhes table as user usually don't care about company - he cares about salary, skills e.t.c.

If user wants to search not fixed parameter, for instance not just programmer position but tester, team leader you have to search not single encoded number but interval.

My idea is just assumption, some base for further investigations))

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top