Unsupervised Clustering with Haskell
Question
I'm trying to develop an algorithm that can report the frequency and closeness in which similar patterns appear between data sets.
Simple example:
set1 = [0, 1, 0, 0, 2, 0, 0, 3, 0]
set2 = [1, 2, 3, 0, 0, 0, 0, 0, 0]
set3 = [0, 0, 0, 0, 0, 1, 2, 0, 3]
Each of these sets have a 1, 2, and 3, but these numbers are within closer proximity in set2 and set3.
I suspect I could accomplish this task with list comprehensions. I could draw the data into variables x and y, and catalog each match into a list of lists where the 1st element in one of the embedded lists is a string of the match found, and the 2nd and 3rd elements are their positions. And I could run this list through another function that calculates how often and how close those matches occur, and reports back a percentage.
Or perhaps there's a more elegant way to do this?
I'm still bit of a Haskell novice. Any advice would be appreciated.
Solution
OK, if you have 1, 2, 3 in that sets in the order, then you have the formula to compute proximity: prox = indexOf 3 - indexOf 1 - 2. So, prox is amount total of zeroes between 1..2 and 2..3. You may write in Haskell:
prox :: [Integer] -> Int
prox s = i3 - i1 - 2
where
Just i3 = findIndex (==3) s
Just i1 = findIndex (==1) s
You may generalize it for the case without assumption that 1 goes first and 3 is last.