Question

As a project to further my knowledge and comfort with Haskell I am working towards implementing a JPEG decoder which will come in handy for future computer vision work.

The first step I have chosen is to parse all "Markers" within the image. These are indicated by the byte 0xFF followed by a non-0 byte. A 0xFF byte followed by a 0x00 byte shall be treated as normal data.

The problem I am running into is that upon encountering a 0xFF 0x00 combination, parsing seems to finish completely and no more valid markers are found (if you run on a standard JPEG you will see the start of image marker parsed, but not the end of image marker as often 0xFF 0x00 occurs within the image data itself).

import System.Environment
import System.IO

import Control.Applicative hiding (many)
import Data.Attoparsec as A
import qualified Data.ByteString as BS

parseMarker = do
  part1 <- word8 0xFF
  part2 <- notWord8 0x0
  return (part1, part2)

parseSection = do
  A.skipWhile (\x -> x /= 0xFF) *> parseMarker

parseBody = do
  many parseSection

parseJPEG jpeg = do
  handleParseResult $ feed (parse parseBody jpeg) BS.empty

handleParseResult result = do
  case result of
    Fail _ _ msg -> msg
    Done _ r -> show r
    _ -> ""

main = do
  (filename : _ ) <- getArgs
  handle <- openFile filename ReadMode
  contents <- BS.hGetContents handle
  putStrLn $ parseJPEG contents
  hClose handle

https://gist.github.com/767488

Was it helpful?

Solution

Since parseMarker consumes input but may fail partway through, you have to be able to "rewind" and retry a different parse when you encounter an 0xFF 0x00.

I don't have Attoparsec installed but I assume it is similar to Parsec which doesn't backtrack by default.

parseSection =
    skipMany (notWord8 0xFF <|> try (word8 0xFF >> word8 0x0)) >> parseMarker

OTHER TIPS

The problem is you don't tell it to parse a sequence of 0xFF,0x00, it's late so hopefully the next answerer will put words to it (it perhaps it helps you enough), but here is an alternate parseMarker and accompanying handleParseResult:

parseMarker = do   -- or call it "parsePotentialMarker"
  part1 <- word8 0xFF
  part2 <- anyWord8
  return $
    if (part2 /= 0)
        then [(part1, part2)]
        else []

-- ... skipping other functions...
handleParseResult result = do
  case result of
    Fail _ _ msg -> msg
    Done _ r -> show (concat r)
-- ...

On a side note, in questions it helps slightly to have a fully functional test in the code, something like:

main =
  let contents = BS.pack [1,2,3,4,0xFF,1 {- marker 1 -},0xFF,0x00,0xFF,2 {- marker 2 -},31,13,0xFF,0x00]
  in putStrLn $ parseJPEG contents
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top