SQL and fuzzy comparison

Question 1

I don't really think there is a definitive answer because it depends on information not available in the question. Anyway, too long for a comment.

DBMSes are good at retrieving information according to indexes. It does not make sense to have a db server wasting time in heavy computations unless it is dedicated for this specific purpose (as answered by @Adrian).

Therefore, your client application should delegate to the DBMS the retrieval of information required by the rules.

If the computations are minor, all could be done on the server. Else, pull it off into the client system.

The disadvantage of the second approach lies in the amount of data traveling from the server to the client and the number of connections to establish. So, typically it is a compromise between computation and data transfer in the server. A balance to be achieved depending on the specificities of the fuzzy rules.

Edit: I've seen in a comment that you are almost sure to have to implement the code in the client. In that case, you should consider an additional criterion, code locality, for maintenance purposes, i.e., try to have all code that is related together, not spreading it between systems (and languages).

Question 2

I would say you're best off using simple selects to get the closest matches you can without hammering the database, then do the heavy lifting in your application layer. The reason I would suggest this solution is scalability: if you do your heavy lifting in the application layer, your problem is a perfect use case for a map-reduce-style solution wherein you can distribute the processing of similarities across nodes and get your results back much faster than if you put it through the database; plus, this way, you're not locking up your database and slowing down any other operations that may be going on at the same time.

Question 3

Since you're still considering what DB to use PostgreSQL has fuzzystrmatch module which provides Levenshtein and Soundex functions. Also, you might want to look on the pg_trm module as described here. Maybe you could also put the index on the column using soundex() so you won't have to calculate that every time. But you seem to optimize prematurely so my advice would be to test using pg and then wonder if you need to optimize or not, the numbers you provided really don't seem like a lot considered you almost have two minutes to run one query.

Question 4

An option i'd consider is to add a column in the "People Talbe" that is the SoundEx value of the person.

I've done joins using

Select [Column}
From People P 
    Inner join TableA A  on Soundex(A.ComarisonColumn) = P.SoundexColumn

That'll return anything in TableA that has the same SoundEx value from the People Tables SoundEx Column.

I haven't used that kind of query on tables that size, but i see no issues with trying it. You can also index that SoundExColumn to help with performance.