Question

I need to test if a string (filenames with their complete path) contains another one in MSSQL.

My script needs to check if the file we are about to commit is present in the database under a specific column (pre-hook script).

I cannot really change the data definition of the column, but we are currently using text TEXT and files are seperated by a new line character. I tried to use TSQL function CONTAINS, but the overall performance is not really good.

Would it be a better idea to load all the data in a PHP array and do the comparaison locally ?

I'm not quite sure what is best way to do here.

Update: There is about 194 530 rows in the database.

Was it helpful?

Solution

The main thing to keep in mind when doing a search through a string is that you want to limit the length of the string you are searching through. Right now, you have multiple path+filename values tucked into a single row-column pair - as I've mentioned above, this is poorly normalized (and is part of the reason you're having trouble doing lookups).

Given that you can't really change the schema of the table you're having trouble with, a better alternative might be creating a structure to work with the metadata that describes the files stored within a certain row.

For example, one option might be to create a table that contains filename-rowID pairs, where each row of the original table is linked to the parsed-out filenames within the TEXTcolumn of that row. That gives you the option of limiting your search by first doing a lookup on a shorter string (the filename), and then using that constraint to help search a smaller number of rows to satisfy the path+filename combination and achieve a unique result.

If you have a large number of files with identical names, another option might be to implement a hash index, using rowIDs from your original table and a hash of each path+filename from that row using CHECKSUM() or whatever hashing function you have available.

Using an 'indexing' table like this one does add overhead: you have to maintain the metadata as the original table gets updated, but it also means you're doing your heavy lifting ahead of time and making future queries of the data much faster.

OTHER TIPS

How about using the LIKE operator? You could do something like this

SELECT * FROM TABLE WHERE COLUMN LIKE '%' + @FilePath +'%'

If this does not fit your needs, then, I would agree that doing this programmatically might be better. The problem is that SQL uses set based logic, so when you begin doing something that is more procedural (functions), it breaks down. Obviously, run tests, but programmatically you should be able to do this quicker. You could use regular expressions or contains or whatever might be best within php

I think this would be faster:

SELECT TOP 1 columnname FROM tablename WHERE COLUMN LIKE '%' + @FilePath +'%'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top