I tried to run your code on my host and I am not able to reproduce your profile:
runhaskell test8.hs +RTS -sstderr
done
120,112 bytes allocated in the heap
3,520 bytes copied during GC
68,968 bytes maximum residency (1 sample(s))
12,952 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.00s 0.09s 0.0909s 0.0909s
INIT time 0.00s ( 0.01s elapsed)
MUT time 0.00s ( 29.21s elapsed)
GC time 0.00s ( 0.09s elapsed)
EXIT time 0.00s ( 0.09s elapsed)
Total time 0.01s ( 29.40s elapsed)
%GC time 5.7% (0.3% elapsed)
Alloc rate 381,307,936 bytes per MUT second
Productivity 91.1% of total user, 0.0% of total elapsed
Can you pls outline some more detail about how you are testing the code? If you are using ghci then
$ ghci -fobject-code
we probably need to use -fobject-code to eliminate any space leaks from the ghci. If you have already tried the ghci option, assuming that you are using ghci, I will edit my answer. At this point, I would like to reproduce the issue you are seeing.
Update:
@ duplode : Thank you for the pointers. I am going to delete the previous output no one objects to it as it is misleading.
I have been able to reduce the gc overhead by a bit using one of the following options. I am getting some benefits but the overhead is still in the 49 - 50 % range:
ts3 :: V.Vector (Int, Int) -> M.HashMap Int [Int]
ts3 vec =
V.foldl (\ht (!k, !v) ->
let
element = M.lookup k ht in
case element of
Nothing -> M.insert k [v] ht
Just aList -> M.insert k (v:aList) ht) M.empty vec
ts4 :: V.Vector (Int,Int) -> M.HashMap Int [Int]
ts4 vec =
let initMap = V.foldl (\ht (!k,_) -> M.insert k [] ht) M.empty vec
in
V.foldl (\ht (!k, !v) -> M.adjust(\x -> v:x) k ht) initMap vec
The adjust seemed a bit better, but they results seem similar to a straight lookup. With ts4 using adjust:
calling ts4 done.
3,838,059,320 bytes allocated in the heap
2,041,603,344 bytes copied during GC
377,412,728 bytes maximum residency (6 sample(s))
7,725,944 bytes maximum slop
737 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 7260 colls, 0 par 1.32s 1.45s 0.0002s 0.0013s
Gen 1 6 colls, 0 par 0.88s 1.40s 0.2328s 0.9236s
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.18s ( 2.21s elapsed)
GC time 2.19s ( 2.85s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.01s ( 0.07s elapsed)
Total time 4.38s ( 5.13s elapsed)
%GC time 50.0% (55.5% elapsed)
Alloc rate 1,757,267,879 bytes per MUT second
Productivity 50.0% of total user, 42.7% of total elapsed
Using the simple lookup/update (imperative style of updating a map)
calling ts3 done.
3,677,137,816 bytes allocated in the heap
2,040,053,712 bytes copied during GC
395,867,512 bytes maximum residency (6 sample(s))
7,326,104 bytes maximum slop
769 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6999 colls, 0 par 1.35s 1.51s 0.0002s 0.0037s
Gen 1 6 colls, 0 par 1.06s 2.16s 0.3601s 1.3175s
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.89s ( 2.07s elapsed)
GC time 2.41s ( 3.67s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.01s ( 0.08s elapsed)
Total time 4.31s ( 5.82s elapsed)
%GC time 55.9% (63.0% elapsed)
Alloc rate 1,942,816,558 bytes per MUT second
Productivity 44.1% of total user, 32.6% of total elapsed
I am interested in finding out as to how to reduce the time for lookup as show in the profile output below:
COST CENTRE MODULE %time %alloc
ts3.\ Main 54.1 91.4
ts3.\.element Main 19.0 2.9
ts3 Main 11.0 2.9
kv.k Main 6.5 1.4
kv.v Main 5.2 1.4
kv.k.\ Main 4.0 0.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 72 0 0.0 0.0 100.0 100.0
main Main 158 0 0.0 0.0 0.0 0.0
CAF:main Main 143 0 0.0 0.0 84.2 97.1
main Main 144 1 0.0 0.0 84.2 97.1
ts3 Main 145 1 11.0 2.9 84.2 97.1
ts3.\ Main 156 5000000 54.1 91.4 73.2 94.3
ts3.\.element Main 157 5000000 19.0 2.9 19.0 2.9
CAF:kv Main 142 0 0.0 0.0 0.0 0.0
Code
-- ghc -O2 --make test8.hs -prof -auto-all -caf-all -fforce-recomp +RTS
-- ./test8 +RTS -p
{-# LANGUAGE BangPatterns #-}
import qualified Data.HashMap.Strict as M
import qualified Data.Vector.Unboxed as V
n :: Int
n = 5000000
kv :: V.Vector (Int,Int)
kv = V.zip (k) (v)
where
k = V.generate n (\i -> i `mod` 1000)
v = V.generate n (\i -> i)
ts :: V.Vector (Int,Int) -> M.HashMap Int Int
ts vec =
V.foldl' (\ht (k, v) -> M.insertWith (+) k v ht) M.empty vec
ts2 :: V.Vector (Int,Int) -> M.HashMap Int [Int]
ts2 vec =
V.foldl (\ht (!k, !v) -> M.insertWith (++) k [v] ht) M.empty vec
ts3 :: V.Vector (Int, Int) -> M.HashMap Int [Int]
ts3 vec =
V.foldl (\ht (!k, !v) ->
let
element = M.lookup k ht in
case element of
Nothing -> M.insert k [v] ht
Just aList -> M.insert k (v:aList) ht) M.empty vec
ts4 :: V.Vector (Int,Int) -> M.HashMap Int [Int]
ts4 vec =
let initMap = V.foldl (\ht (!k,_) -> M.insert k [] ht) M.empty vec
in
V.foldl (\ht (!k, !v) -> M.adjust(\x -> v:x) k ht) initMap vec
main :: IO ()
main = ts3 kv `seq` putStrLn "calling ts3 done."
main1 = do
if x == y then
putStrLn "Algos Match"
else
putStrLn "Error"
where
x = ts2 kv
y = ts4 kv