Question

I have a simple application that allows users to enter a string to search for a name in a database. The server side is ColdFusion 7. The issue I'm having is that a query such as "obrien" does not return entries with the name "o'brien."

I think what I want is fuzzy matching capability. After doing some research I've also come across full text search which might be what I'm looking for; however, I'm not sure about the difference between the two. ColdFusion has a service called verity, but it seems I have to first query all of the database, and then index it--this sounds very costly.

Is there a built-in way to do fuzzy matching or full text search in ColdFusion without first querying the entire database? If not, when doing full text search, do I have to specify the indexes? For example, obrien should index to "obrien, o'brien, o'brein"?

Was it helpful?

Solution

What about trying to use the SOUNDEX founction? I don't think there is any easy answer here.

OTHER TIPS

check below queries,

select * 
from tablename 
where  Replace(ColumnName,'''','') like 'obrien'

OR

select * 
from tablename 
where  Replace(ColumnName,'''','') like '%obrien%'

You are looking for fulltextsearch with fuzzy matching capabilities, more importantly you are looking for the built-in tokenizers that these systems offer. However, you can also use a SQL Function that will do Levenshtein edit distance matching.

Verity or any other information retrieval system such as Lucene would be a solution, but you'd have to constantly reindex your data from the raw data if you need real time search. If you don't need to repopulate Verity very often, this could be a good choice (bless your heart for dealing with it though). Tokenizers handle formatting the data from the database and the user so that you end up with identical strings. Also Verity is old in ColdFusion 7, so I imagine it's fuzzy matching options are pretty limited compared to what you would find in Lucene 4.

The other option is to use a a function in sql for levenshtein edit distance algorithm. This would basically tell you how many edits it takes to get from one string to the other. This is "fuzzy" matching. For instance to turn lcase(O'brien) into lcase(obrien) it takes one edit, the removal of the "'". One edit of a six length string is a good match. To turn "hugh" into "jones" it would take five edits (changing every letter,adding a letter). Five edits of a length 4 string is not a very good match. Levenshtein distance in T-SQL

Finally, for any database with fulltext search, this will give you live data with some tokenizing like verity. It's nice because it's so, so simple to setup. I'm not sure if DB2 supports fuzzy matching. SQL Server does not.

You are going down the wrong track here - its not a fuzzy match you need, its a parameterised query.

You can set the value of the parameter to "O'Brien" and the parameter will understand the function of the embedded apostrophe.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top