여러 번 호출되는 간단한 파서 최적화

https://stackoverflow.com//questions/24063219

02-01-2020
|

문제

나는 다음을 사용하여 사용자 정의 파일에 대한 파서를 작성했습니다. attoparsec.프로파일링 보고서에 따르면 메모리 할당의 약 67%가 다음과 같은 함수에서 수행됩니다. tab, 또한 가장 많은 시간을 소모합니다.그만큼 tab 함수는 매우 간단합니다.

tab :: Parser Char
tab = char '\t'

전체 프로파일링 보고서는 다음과 같습니다.

       ASnapshotParser +RTS -p -h -RTS

    total time  =       37.88 secs   (37882 ticks @ 1000 us, 1 processor)
    total alloc = 54,255,105,384 bytes  (excludes profiling overheads)

COST CENTRE    MODULE                %time %alloc

tab            Main                   83.1   67.7
main           Main                    6.4    4.2
readTextDevice Data.Text.IO.Internal   5.5   24.0
snapshotParser Main                    4.7    4.0


                                                             individual     inherited
COST CENTRE        MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN               MAIN                     75           0    0.0    0.0   100.0  100.0
 CAF               Main                    149           0    0.0    0.0   100.0  100.0
  tab              Main                    156           1    0.0    0.0     0.0    0.0
  snapshotParser   Main                    153           1    0.0    0.0     0.0    0.0
  main             Main                    150           1    6.4    4.2   100.0  100.0
   doStuff         Main                    152     1000398    0.3    0.0    88.1   71.8
    snapshotParser Main                    154           0    4.7    4.0    87.7   71.7
     tab           Main                    157           0   83.1   67.7    83.1   67.7
   readTextDevice  Data.Text.IO.Internal   151       40145    5.5   24.0     5.5   24.0
 CAF               Data.Text.Array         142           0    0.0    0.0     0.0    0.0
 CAF               Data.Text.Internal      140           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Handle.FD        122           0    0.0    0.0     0.0    0.0
 CAF               GHC.Conc.Signal         103           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding         101           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.FD               100           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding.Iconv    89           0    0.0    0.0     0.0    0.0
  main             Main                    155           0    0.0    0.0     0.0    0.0

이것을 어떻게 최적화합니까?

전체 코드 파서는 여기에 있습니다. 제가 분석 중인 파일은 약 77MB입니다.

해결책

tab 희생양이다.정의한다면 boo :: Parser (); boo = return () 그리고 boo 모든 바인드 전에 snapshotParser 정의에 따르면 비용 할당은 다음과 같습니다.

 main             Main                    255           0   11.8   13.8   100.0  100.0
  doStuff         Main                    258     2097153    1.1    0.5    86.2   86.2
   snapshotParser Main                    260           0    0.4    0.1    85.1   85.7
    boo           Main                    262           0   71.0   73.2    84.8   85.5
     tab          Main                    265           0   13.8   12.3    13.8   12.3

따라서 프로파일러는 구문 분석 결과 할당에 대한 책임을 전가하는 것으로 보입니다. 이는 아마도 광범위한 인라인 처리로 인해 발생할 수 있습니다. attoparsec John L이 의견에서 제안한 것처럼 코드.

성능 문제에 있어서 핵심은 77MB 텍스트 파일을 구문 분석하여 백만 개의 요소가 포함된 목록을 작성할 때 파일 처리가 엄격하지 않고 게으르기를 원한다는 것입니다.그것이 정리되면 I/O를 분리하고 구문 분석합니다. doStuff 누산기 없이 스냅샷 목록을 작성하는 것도 도움이 됩니다.다음은 이를 고려하여 수정된 프로그램 버전입니다.

{-# LANGUAGE BangPatterns #-}
module Main where

import Data.Maybe
import Data.Attoparsec.Text.Lazy
import Control.Applicative
import qualified Data.Text.Lazy.IO as TL
import Data.Text (Text)
import qualified Data.Text.Lazy as TL

buildStuff :: TL.Text -> [Snapshot]
buildStuff text = case maybeResult (parse endOfInput text) of
  Just _ -> []
  Nothing -> case parse snapshotParser text of
      Done !i !r -> r : buildStuff i
      Fail _ _ _ -> []

main :: IO ()
main = do
  text <- TL.readFile "./snap.dat"
  let ss = buildStuff text
  print $ listToMaybe ss
    >> Just (fromIntegral (length $ show ss) / fromIntegral (length ss))

newtype VehicleId = VehicleId Int deriving Show
newtype Time = Time Int deriving Show
newtype LinkID = LinkID Int deriving Show
newtype NodeID = NodeID Int deriving Show
newtype LaneID = LaneID Int deriving Show

tab :: Parser Char
tab = char '\t'

-- UNPACK pragmas. GHC 7.8 unboxes small strict fields automatically;
-- however, it seems we still need the pragmas while profiling. 
data Snapshot = Snapshot {
  vehicle :: {-# UNPACK #-} !VehicleId,
  time :: {-# UNPACK #-} !Time,
  link :: {-# UNPACK #-} !LinkID,
  node :: {-# UNPACK #-} !NodeID,
  lane :: {-# UNPACK #-} !LaneID,
  distance :: {-# UNPACK #-} !Double,
  velocity :: {-# UNPACK #-} !Double,
  vehtype :: {-# UNPACK #-} !Int,
  acceler :: {-# UNPACK #-} !Double,
  driver :: {-# UNPACK #-} !Int,
  passengers :: {-# UNPACK #-} !Int,
  easting :: {-# UNPACK #-} !Double,
  northing :: {-# UNPACK #-} !Double,
  elevation :: {-# UNPACK #-} !Double,
  azimuth :: {-# UNPACK #-} !Double,
  user :: {-# UNPACK #-} !Int
  } deriving (Show)

-- No need for bang patterns here.
snapshotParser :: Parser Snapshot
snapshotParser = do
  sveh <- decimal
  tab
  stime <- decimal
  tab
  slink <- decimal
  tab
  snode <- decimal
  tab
  slane <- decimal
  tab
  sdistance <- double
  tab
  svelocity <- double
  tab
  svehtype <- decimal
  tab
  sacceler <- double
  tab
  sdriver <- decimal
  tab
  spassengers <- decimal
  tab
  seasting <- double
  tab
  snorthing <- double
  tab
  selevation <- double
  tab
  sazimuth <- double
  tab
  suser <- decimal
  endOfLine <|> endOfInput
  return $ Snapshot
    (VehicleId sveh) (Time stime) (LinkID slink) (NodeID snode)
    (LaneID slane) sdistance svelocity svehtype sacceler sdriver
    spassengers seasting snorthing selevation sazimuth suser

이 버전은 내가 했던 것처럼 전체 스냅샷 목록을 메모리에 강제로 저장하더라도 허용 가능한 성능을 가져야 합니다. main 여기."허용 가능한" 항목을 측정하려면 각 필드에 16개의(작고 상자 표시되지 않은) 필드가 있다는 점을 명심하세요. Snapshot 게다가 간접비 ~의 Snapshot 및 목록 생성자의 경우 목록 셀당 약 152바이트에 대해 이야기하고 있으며 이는 테스트 데이터의 경우 ~152MB로 요약됩니다.어쨌든 이 버전은 가능한 한 게으르다. main, 또는 다음으로 대체 last ss.

참고:내 테스트는 attoparsec-0.12로 수행되었습니다.

다른 팁

attoparsec을 최신 버전으로 업데이트한 후(0.12.0.0), 실행에 걸리는 시간이 38초에서 16초로 단축되었습니다.50% 이상의 속도 향상이 가능합니다.또한 소비되는 메모리도 크게 줄었습니다.@JohnL이 언급했듯이 프로파일링이 활성화되면 결과가 크게 달라집니다.최신 버전의 attoparsec 라이브러리로 프로파일링을 시도했을 때 전체 프로그램을 실행하는 데 약 64초가 걸렸습니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow