何度も呼び出される単純なパーサーの最適化

https://stackoverflow.com//questions/24063219

02-01-2020
|

質問

を使用してカスタムファイルのパーサーを作成しました attoparsec. 。プロファイリングレポートは、メモリ割り当ての約 67% がという名前の関数で行われていることを示しました。 tab, 、これも最も時間がかかります。の tab 関数は非常に単純です:

tab :: Parser Char
tab = char '\t'

プロファイリングレポート全体は次のとおりです。

       ASnapshotParser +RTS -p -h -RTS

    total time  =       37.88 secs   (37882 ticks @ 1000 us, 1 processor)
    total alloc = 54,255,105,384 bytes  (excludes profiling overheads)

COST CENTRE    MODULE                %time %alloc

tab            Main                   83.1   67.7
main           Main                    6.4    4.2
readTextDevice Data.Text.IO.Internal   5.5   24.0
snapshotParser Main                    4.7    4.0


                                                             individual     inherited
COST CENTRE        MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN               MAIN                     75           0    0.0    0.0   100.0  100.0
 CAF               Main                    149           0    0.0    0.0   100.0  100.0
  tab              Main                    156           1    0.0    0.0     0.0    0.0
  snapshotParser   Main                    153           1    0.0    0.0     0.0    0.0
  main             Main                    150           1    6.4    4.2   100.0  100.0
   doStuff         Main                    152     1000398    0.3    0.0    88.1   71.8
    snapshotParser Main                    154           0    4.7    4.0    87.7   71.7
     tab           Main                    157           0   83.1   67.7    83.1   67.7
   readTextDevice  Data.Text.IO.Internal   151       40145    5.5   24.0     5.5   24.0
 CAF               Data.Text.Array         142           0    0.0    0.0     0.0    0.0
 CAF               Data.Text.Internal      140           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Handle.FD        122           0    0.0    0.0     0.0    0.0
 CAF               GHC.Conc.Signal         103           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding         101           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.FD               100           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding.Iconv    89           0    0.0    0.0     0.0    0.0
  main             Main                    155           0    0.0    0.0     0.0    0.0

これを最適化するにはどうすればよいですか?

コード全体パーサーがここにあるからです。私が解析しているファイルは約 77MB です。

解決

tab スケープゴートです。定義すると boo :: Parser (); boo = return () そして、 boo のすべてのバインドの前に、 snapshotParser 定義すると、コストの配分は次のようになります。

 main             Main                    255           0   11.8   13.8   100.0  100.0
  doStuff         Main                    258     2097153    1.1    0.5    86.2   86.2
   snapshotParser Main                    260           0    0.4    0.1    85.1   85.7
    boo           Main                    262           0   71.0   73.2    84.8   85.5
     tab          Main                    265           0   13.8   12.3    13.8   12.3

したがって、おそらくプロファイラーが解析結果の割り当ての責任を転嫁しているようです。これは、おそらく大規模なインライン化が原因です。 attoparsec John L がコメントで示唆したように、コード。

パフォーマンスの問題に関して言えば、重要な点は、77MB のテキストファイルを解析して 100 万個の要素を含むリストを構築するとき、ファイル処理を厳格ではなく遅延させる必要があるということです。それが解決されたら、I/O を分離して解析します。 doStuff アキュムレータを使用せずにスナップショットのリストを作成することも役立ちます。これを考慮してプログラムを修正したバージョンを次に示します。

{-# LANGUAGE BangPatterns #-}
module Main where

import Data.Maybe
import Data.Attoparsec.Text.Lazy
import Control.Applicative
import qualified Data.Text.Lazy.IO as TL
import Data.Text (Text)
import qualified Data.Text.Lazy as TL

buildStuff :: TL.Text -> [Snapshot]
buildStuff text = case maybeResult (parse endOfInput text) of
  Just _ -> []
  Nothing -> case parse snapshotParser text of
      Done !i !r -> r : buildStuff i
      Fail _ _ _ -> []

main :: IO ()
main = do
  text <- TL.readFile "./snap.dat"
  let ss = buildStuff text
  print $ listToMaybe ss
    >> Just (fromIntegral (length $ show ss) / fromIntegral (length ss))

newtype VehicleId = VehicleId Int deriving Show
newtype Time = Time Int deriving Show
newtype LinkID = LinkID Int deriving Show
newtype NodeID = NodeID Int deriving Show
newtype LaneID = LaneID Int deriving Show

tab :: Parser Char
tab = char '\t'

-- UNPACK pragmas. GHC 7.8 unboxes small strict fields automatically;
-- however, it seems we still need the pragmas while profiling. 
data Snapshot = Snapshot {
  vehicle :: {-# UNPACK #-} !VehicleId,
  time :: {-# UNPACK #-} !Time,
  link :: {-# UNPACK #-} !LinkID,
  node :: {-# UNPACK #-} !NodeID,
  lane :: {-# UNPACK #-} !LaneID,
  distance :: {-# UNPACK #-} !Double,
  velocity :: {-# UNPACK #-} !Double,
  vehtype :: {-# UNPACK #-} !Int,
  acceler :: {-# UNPACK #-} !Double,
  driver :: {-# UNPACK #-} !Int,
  passengers :: {-# UNPACK #-} !Int,
  easting :: {-# UNPACK #-} !Double,
  northing :: {-# UNPACK #-} !Double,
  elevation :: {-# UNPACK #-} !Double,
  azimuth :: {-# UNPACK #-} !Double,
  user :: {-# UNPACK #-} !Int
  } deriving (Show)

-- No need for bang patterns here.
snapshotParser :: Parser Snapshot
snapshotParser = do
  sveh <- decimal
  tab
  stime <- decimal
  tab
  slink <- decimal
  tab
  snode <- decimal
  tab
  slane <- decimal
  tab
  sdistance <- double
  tab
  svelocity <- double
  tab
  svehtype <- decimal
  tab
  sacceler <- double
  tab
  sdriver <- decimal
  tab
  spassengers <- decimal
  tab
  seasting <- double
  tab
  snorthing <- double
  tab
  selevation <- double
  tab
  sazimuth <- double
  tab
  suser <- decimal
  endOfLine <|> endOfInput
  return $ Snapshot
    (VehicleId sveh) (Time stime) (LinkID slink) (NodeID snode)
    (LaneID slane) sdistance svelocity svehtype sacceler sdriver
    spassengers seasting snorthing selevation sazimuth suser

このバージョンでは、私が行ったように、スナップショットのリスト全体をメモリに強制的に格納した場合でも、許容可能なパフォーマンスが得られるはずです。 main ここ。何が「許容できる」かを判断するには、各フィールドに 16 個の (小さく、ボックス化されていない) フィールドがあることを念頭に置いてください。 Snapshot プラスオーバーヘッドの Snapshot リストコンストラクターとリストコンストラクターの場合、リストセルあたり 152 バイトについて話します。これは、テストデータの場合、約 152 MB になります。いずれにせよ、このバージョンは可能な限り遅延しています。 main, 、または次のように置き換えます。 last ss.

注意：私のテストは attoparsec-0.12 で行われました。

他のヒント

attoparsec を最新バージョンに更新した後 (0.12.0.0)、実行にかかる時間は 38 秒から 16 秒に短縮されます。これは 50% 以上の高速化です。また、メモリ消費量も大幅に削減されました。@JohnL が指摘したように、プロファイリングを有効にすると、結果は大きく異なります。最新バージョンの attoparsec ライブラリを使用してプロファイリングを試みたところ、プログラム全体の実行に約 64 秒かかりました。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow