Database indexing and lookup with “closest neighbour” not exact match

https://stackoverflow.com/questions/12825011

06-07-2021
|

Question

I'm dealing with an interesting issue.

I have biometric system that uses John Daugman's algorithm to transform human irises into binary code (for some research at our university).

The iris code is "flat" (it's not stored as a circle, but transformed into rectangle):

column 1 | column 2 | column 3 | ...

10011001 ...
10110111
01100010
...

Where column represents 30 bits. The problem is that each scan of iris has its own noise mask (eye lids, reflections...) and matches aren't 100% but at best around 96-98%.

So far we are using algorithm like this (Hamming Distance matching):

mask = mask1 & mask2;
result = (code1 ^ code2) & mask;

// ration of 1 bits allowed by mask
double difference = (double)one_bits(result)/one_bits(mask);

The problem with that we are now building real database of irises (around 1200-1300 subject, each 3-5 iris samples and you have to count in rotation so you need to make around 10 tests for each). And we need to compare current sample against whole database (65 000 comparisons on 80*30 bits) which turns out to be slow.

Question: is there any hash functions which reflects data structure (and changes just a bit when few bit changes) or is "error tolerant"? We need to build fast search algorithm in the whole database (so we are looking for possible ways to index this).

UPDATE: I guess it should be implemented by some sort of "closest neighbour" lookup, or use some sort of clustering (where similar irises would be grouped and in first round only some representatives would be checked).

Solution

Check Locality Sensitive Hashing (LSH), implementations like this.

"A nilsimsa code is something like a hash, but unlike hashes, a small change in the message results in a small change in the nilsimsa code. Such a function is called a locality-sensitive hash."

How to understand Locality Sensitive Hashing?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow