Question

I'm trying to parse some bibliographic data, more specifically, pull out the 'subject' field for each item. The data is json and looks something like this:

{"rows": [

      {"doc":{"sourceResource": {"subject": ["fiction", "horror"]}}},
      {"doc":{"sourceResource": {"subject": "fantasy"}}}
]}

I can pull out 'subject' if every entry is either Text or [Text], but I'm stumped as to how to accommodate both. Here is my program in its current state:

{-# LANGUAGE OverloadedStrings#-}
import Debug.Trace
import Data.Typeable
import Data.Aeson
import Data.Text
import Control.Applicative
import Control.Monad
import qualified Data.ByteString.Lazy as B
import Network.HTTP.Conduit (simpleHttp)
import qualified Data.HashMap.Strict  as HM
import qualified Data.Map as Map

jsonFile :: FilePath
jsonFile = "bib.json"

getJSON :: IO B.ByteString
getJSON = B.readFile jsonFile


data Document = Document { rows :: [Row]}
              deriving (Eq, Show)


data Row = SubjectList [Text]
         | SubjectText Text
         deriving (Eq, Show)


instance FromJSON Document where
  parseJSON (Object o) = do
    rows <- parseJSON =<< (o .: "rows")
    return $ Document rows
  parseJSON _ = mzero


instance FromJSON Row where
  parseJSON (Object o) = do
    item <- parseJSON =<< ((o .: "doc") >>=
                           (.: "sourceResource") >>=
                           (.: "subject"))
    -- return $ SubjectText item
    return $ SubjectList item
  parseJSON _ = mzero

main :: IO ()
main = do
   d <- (decode <$> getJSON) :: IO (Maybe Document)
   print d

Any help would be appreciated.

Edit:

the working FromJSON Row instance:

instance FromJSON Row where
  parseJSON (Object o) =
    (SubjectList <$> (parseJSON =<< val)) <|>
    (SubjectText <$> (parseJSON =<< val))
    where
      val = ((o .: "doc") >>=
             (.: "sourceResource") >>=
             (.: "subject"))
  parseJSON _ = mzero
Was it helpful?

Solution

First, look at the type of

((o .: "doc") >>=
 (.: "sourceResource") >>=
 (.: "subject")) :: FromJSON b => Parser b

We can get out of it anything that's an instance of FromJSON. Now, clearly, this can work for Text or [Text] individually, but your problem is that you want to get either Text or [Text]. Fortunately, it should be fairly easy to deal with this. Rather than letting it decode it for you further, just get a Value out of it. Once you've got a Value, you could decode it as a Text and put it in a SubjectText:

SubjectText <$> parseJSON val :: Parser Row

Or as a [Text] and put it in a SubjectList:

SubjectList <$> parseJSON val :: Parser Row

But wait, either one of these will do, and they have the same output type. Notice that Parser is an instance of Alternative, which lets us say exactly that (“either one will do”). Thus,

(SubjectList <$> parseJSON val) <|> (SubjectText <$> parseJSON val) :: Parser Row

Ta-da! (Actually, it wasn't necessary to pull it out as a Value; we could have instead embedded that long ((o .: "doc") >>= (.: "sourceResource") >>= (.: "subject")) chain into each subexpression. But that's ugly.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top