Parsing JPEG markers with attoparsec
-
26-09-2019 - |
Question
As a project to further my knowledge and comfort with Haskell I am working towards implementing a JPEG decoder which will come in handy for future computer vision work.
The first step I have chosen is to parse all "Markers" within the image. These are indicated by the byte 0xFF followed by a non-0 byte. A 0xFF byte followed by a 0x00 byte shall be treated as normal data.
The problem I am running into is that upon encountering a 0xFF 0x00 combination, parsing seems to finish completely and no more valid markers are found (if you run on a standard JPEG you will see the start of image marker parsed, but not the end of image marker as often 0xFF 0x00 occurs within the image data itself).
import System.Environment
import System.IO
import Control.Applicative hiding (many)
import Data.Attoparsec as A
import qualified Data.ByteString as BS
parseMarker = do
part1 <- word8 0xFF
part2 <- notWord8 0x0
return (part1, part2)
parseSection = do
A.skipWhile (\x -> x /= 0xFF) *> parseMarker
parseBody = do
many parseSection
parseJPEG jpeg = do
handleParseResult $ feed (parse parseBody jpeg) BS.empty
handleParseResult result = do
case result of
Fail _ _ msg -> msg
Done _ r -> show r
_ -> ""
main = do
(filename : _ ) <- getArgs
handle <- openFile filename ReadMode
contents <- BS.hGetContents handle
putStrLn $ parseJPEG contents
hClose handle
Solution
Since parseMarker
consumes input but may fail partway through, you have to be able to "rewind" and retry a different parse when you encounter an 0xFF 0x00.
I don't have Attoparsec installed but I assume it is similar to Parsec which doesn't backtrack by default.
parseSection =
skipMany (notWord8 0xFF <|> try (word8 0xFF >> word8 0x0)) >> parseMarker
OTHER TIPS
The problem is you don't tell it to parse a sequence of 0xFF,0x00, it's late so hopefully the next answerer will put words to it (it perhaps it helps you enough), but here is an alternate parseMarker
and accompanying handleParseResult
:
parseMarker = do -- or call it "parsePotentialMarker"
part1 <- word8 0xFF
part2 <- anyWord8
return $
if (part2 /= 0)
then [(part1, part2)]
else []
-- ... skipping other functions...
handleParseResult result = do
case result of
Fail _ _ msg -> msg
Done _ r -> show (concat r)
-- ...
On a side note, in questions it helps slightly to have a fully functional test in the code, something like:
main =
let contents = BS.pack [1,2,3,4,0xFF,1 {- marker 1 -},0xFF,0x00,0xFF,2 {- marker 2 -},31,13,0xFF,0x00]
in putStrLn $ parseJPEG contents