Question

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete

I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.

What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.

I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.

Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine

Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.

Was it helpful?

Solution

This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.

Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)

I'd go for something similar to:

NoM = Number of Mail

(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30

Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0

It's just an idea, anyway it is to give different importance to the most and just mailed contacts.

OTHER TIPS

Take a look at Self organizing lists.

A quick and dirty look:

Move to Front Heuristic: A linked list, Such that whenever a node is selected, it is moved to the front of the list.

Frequency Heuristic: A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.

It looks like the move to front implementation would best suit your needs.

EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.

Further Edit: Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.

The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.

If you want to get crazy, mark the most 'active' emails in one of several ways:

  • Last access
  • Frequency of use
  • Contacts with pending sales
  • Direct bosses
  • Etc

Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.

It's a lot of work but kind of fun...

Maybe count the number of emails sent to each address. Then:

ORDER BY EmailCount DESC, LastName, FirstName

That way, your most-often-used addresses come first, even if they haven't been used in a few days.

I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).

I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.

Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.

This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.

The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.

In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.

I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).

I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).

Each use:

UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = 'foo@bar.com'

Each interval:

UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)

In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.

The increment and diminisher would have to be adjusted to preference, of course.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top