Unsupervised Clustering with Haskell

https://stackoverflow.com/questions/10459384

05-06-2021
|

Question

I'm trying to develop an algorithm that can report the frequency and closeness in which similar patterns appear between data sets.

Simple example:

set1 = [0, 1, 0, 0, 2, 0, 0, 3, 0]
set2 = [1, 2, 3, 0, 0, 0, 0, 0, 0]
set3 = [0, 0, 0, 0, 0, 1, 2, 0, 3]

Each of these sets have a 1, 2, and 3, but these numbers are within closer proximity in set2 and set3.

I suspect I could accomplish this task with list comprehensions. I could draw the data into variables x and y, and catalog each match into a list of lists where the 1st element in one of the embedded lists is a string of the match found, and the 2nd and 3rd elements are their positions. And I could run this list through another function that calculates how often and how close those matches occur, and reports back a percentage.

Or perhaps there's a more elegant way to do this?

I'm still bit of a Haskell novice. Any advice would be appreciated.

Solution

OK, if you have 1, 2, 3 in that sets in the order, then you have the formula to compute proximity: prox = indexOf 3 - indexOf 1 - 2. So, prox is amount total of zeroes between 1..2 and 2..3. You may write in Haskell:

prox :: [Integer] -> Int
prox s = i3 - i1 - 2
  where
    Just i3 = findIndex (==3) s
    Just i1 = findIndex (==1) s

You may generalize it for the case without assumption that 1 goes first and 3 is last.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow