Efficiently searching a 3-level hierarchy

https://stackoverflow.com/questions/728378

05-09-2019
|

Question

Lets say I have a FirstName > MiddleName > LastName hierarchy (~10k rows, for sake of the question). This means you could have "John > Mary-Anne > Eddy" or "Eddy > John > Jacob" row. The point being that the hierarchy makes little sense and is very foreign to the user (unlike, say, a Country > State > City structure).

Because its so unstructured and confused, I want to provide the user with an auto complete input box. As they type, it should search for possible substring matches, and when they "root" their search string at a level, it will then restrict the results to below that level.

Now, because there are lots of people named "John", it makes little sense that if they type "John" they only get back results like

John > Allen > Alexander
John > Allen > Burschawitz
John > Allen ... repeat 100 times ...

Because they'll never see the unique row "Jason > John > Smith".

Instead, they should get back something like ("*" is just an arbitrary indicator to the user of "hey, lots more rows below this exist"):

John > Allen > *
Jason > John > Smith
Mike > John > *
Mary > Elena > Johnason

If they type "John > Al" then the results would be limited to anything under "John >", but should be grouped similarly to above.

I hope the explanation is clear. The requirements are a bit loose. Just reasonable ones so that a person can search through the tree and find what they are after.

Right now, I have some interesting SQL that looks for the search term in the row, figures out its position, does some substring'ing, group bys, and order by's to get the above results, but its not performing well enough.

I'm trying to solve this problem on a typical LAMP stack (except with Oracle). Its not shared hosting, so I do have full control over the server. The data changes small amounts every few weeks, and the search results can stay stale for a reasonable amount of time (e.g, a cron that updates the search index isn't out of the question).

Solution

Argh. Sorry I wasn't able to describe my problem. Anyways, here's the solution I came up with.

Basically, create a second table from the 3-column table that contains all the distinct values for each successive level of the hierarchy, as well as a column to indicate the depth of that row in the hierarchy.

E.g. From mytable(A, B, C), create search_t(A, B, C, level)

So, with "One > Two > Three", you create 3 rows (A, B, C, level):

"One", null, null, 1
"One", "Two", null, 2
"One", "Two", "Three", 3

When searching, you can constrain the level by picking a value for level and providing values for the upper-level columns:

WHERE A='One' and level > 1 and (B like '%t%' or C like '%t')

It can be a bit simplified and generic if you create a search_str column and perform the LIKE matching against that instead.

WHERE A='One' and level > 1 and search_str like '%t%'

In retrospect, this would probably have been more apparent if the data were already in an adjacency-list model.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow