Parallel Merging of two sorted lists

https://stackoverflow.com/questions/9055766

04-12-2019
|

Question

I am trying to merge two list in parallel. I have two sorted lists [(i, j, val)]. Lists are sorted on j and for same j, sorted on i. If the two lists contain the same (i, j) then their values are added and combined into one, e.g. if the first list contains (i, j, val_1) and the second list contains (i, j, val_2) then combining two will result (i, j, val_1 + val_2).

Merging is highly sequential and after searching, I found this paper. The idea from this paper is to use binary search to get the rank of the elements in the final list. Let's say we are at ith position in the first list so we have (i - 1) elements smaller than the current element in first list and perform binary search for this element's position in the second list (say this position is j). So the position of our current element in final list will be i + j - 1 (i - 1 + j - 1 + 1). I wrote a Haskell code using dph-par for this but I am kind of stuck with update. I have two list

l_1 = [ (1, 1, 1), (2, 1, 1), (4, 1, 1), (1, 4, 1), (2, 4, 1), (4, 4, 1) ]
l_2 = [ (1, 1, 1), (3, 1, 1), (4, 1, 1), (1, 4, 1), (3, 4, 1), (4, 4, 1) ]

and after updating these two lists , we should have

l_3 = [ (1, 1, 2), (2, 1, 1), (3, 1, 1), (4, 1, 2), (1, 4, 2), (2, 4, 2), (3, 4, 1), (4, 4, 2) ]

Bsearch.hs

{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module Bsearch ( interfaceSparse ) where
import qualified Data.Array.Parallel as P
import Data.Array.Parallel.PArray
import qualified Data.Array.Parallel.Prelude as Pre
import qualified Data.Array.Parallel.Prelude.Int as I
import qualified Data.Array.Parallel.Prelude.Double as D

bSearch :: ( I.Int , I.Int , D.Double ) -> [: ( I.Int , I.Int ,D.Double ) :] -> I.Int
bSearch elem@( i , j , val ) xs = ret where
  ret = helpBsearch 0 len where
        len = P.lengthP xs
        helpBsearch :: I.Int -> I.Int -> I.Int
        helpBsearch lo hi
           | lo I.>= hi = lo
           | cond  = helpBsearch ( mid I.+ 1 ) hi
           | otherwise = helpBsearch lo mid
           where mid = I.div ( lo I.+ hi ) 2
             ( i' , j' , val' ) = xs P.!: mid
             cond = case () of
                     _| j' I.< j Pre.|| ( j I.== j' Pre.&& i' I.<i )  -> True
                      | otherwise ->  False

bSearchFun :: [: ( I.Int , I.Int , D.Double ) :] -> [: ( I.Int ,I.Int , D.Double ) :] -> [:I.Int :]
bSearchFun xs ys = P.mapP ( \( x , y ) -> x I.+ y ) ( P.indexedP ( P.mapP  ( \x ->  bSearch x ys ) xs ) )

bSearchMain :: [: ( I.Int , I.Int , D.Double ) :] -> [: ( I.Int , I.Int , D.Double ) :] -> [: ( I.Int  , ( I.Int , I.Int , D.Double ) ) :]
bSearchMain xs ys = l_1 where --here change l_2 for second list
    lst = [: bSearchFun xs ys  , bSearchFun ys xs  :]
    first = lst P.!: 0
    second = lst P.!: 1
    l_1 = P.zipP first xs
    l_2 = P.zipP second ys

interfaceSparse :: PArray ( Int , Int , Double )  ->  PArray ( Int ,Int , Double )  -> PArray   ( Int , ( Int , Int , Double ) )
{-# NOINLINE interfaceSparse #-}
interfaceSparse  xs ys = P.toPArrayP ( bSearchMain ( P.fromPArrayPxs ) ( P.fromPArrayP ys ) )

Main.hs

module Main where
import Bsearch
import qualified Data.Array.Parallel.PArray as P
import Data.List

main = do
 let
   l_1 = P.fromList $ ( [ ( 1 , 1 , 1 ) , ( 2 , 1 , 1)  , ( 4 , 1 , 1 ) , ( 1 , 4 , 1 ) ,( 2 , 4 , 1 ) , ( 4 ,4 , 1 ) ] :: [ ( Int ,Int , Double ) ] )
   l_2 = P.fromList $ ( [ ( 1 , 1 , 1 ) , ( 3 , 1 , 1 ) , ( 4 , 1 , 1) , ( 1 , 4 , 1 ) , ( 3 , 4 , 1 ) , ( 4 , 4 , 1 ) ] :: [ ( Int , Int , Double )] )
   e = interfaceSparse l_1 l_2
 print e

[ntro@localhost parBsearch]$ ghc -c -Odph -fdph-par -fforce-recomp Bsearch.hs
[ntro@localhost parBsearch]$ ghc -c -Odph -fdph-par -fforce-recomp Main.hs
[ntro@localhost parBsearch]$ ghc -o Bsearch -threaded -rtsopts -fdph-par Main.o Bsearch.o

[ntro@localhost parBsearch]$ ./Bsearch --first list
fromList<PArray> [(0,(1,1,1.0)),(2,(2,1,1.0)),(4,(4,1,1.0)),(6,(1,4,1.0)),(8,(2,4,1.0)),(10 (4,4,1.0))]
[ntro@localhost parBsearch]$ ./Bsearch  -- second list
fromList<PArray> [(0,(1,1,1.0)),(3,(3,1,1.0)),(4,(4,1,1.0)),(6,(1,4,1.0)),(9,(3,4,1.0)),(10,(4,4,1.0))]

Could some one please help me with update. I am not sure but this algorithm involves lot of data movement so kindly suggest me something better for this purpose.

Solution

I am unfamiliar with the haskell language, but when I merge sorted lists, I use bitonic sort. This is a great algorithm, and highly parallel in design. The only restriction is that you merge lists of size 2^n. I get around this restriction by padding the short lists with values higher than the known values in the lists, so they accumulate together and can be ignored. I have such huge lists to sort that the power of 2 limitation is easy to accommodate.

http://en.wikipedia.org/wiki/Bitonic_sorter http://en.wikipedia.org/wiki/Odd-even_mergesort

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow