Domanda

I have a large JSON file (about 90MB) that contains an homogeneous array of objects. I am trying to write a Haskell program that reduces the values in the array. This seems like a good candidate for lazy evaluation - the program shouldn't have to read each object from the file until the previous object has been processed.

I've profiled the memory usage of the program when using the Data.Aeson and Text.JSON packages, and it seems that the entire file is being parsed and a full abstract-syntax-tree is being constructed in one pass before the array can processed by the rest of the program. Probably because the parse takes place in the Maybe (or Either or Result) monad, and it isn't known whether the parse will return Just or Nothing (or equivalents) until the complete AST has been built. This gives worryingly high memory usage, and causes space overflows in most cases.

Do any libraries support two-pass parsing? A first parse which determines whether the file CAN be parsed into the expected types, and then a second lazy parse that reads more of the file as it is needed?

Or is there a better way of solving this problem?

È stato utile?

Soluzione

To my knowledge, the only streaming JSON parser on Hackage right now is yajl-enumerator. I've discussed creating a streaming JSON parsing/rendering library in the past, but I've yet to have a strong enough need (or enough demand) to do so. I would definitely be in favor of the existence of such a library, and would be happy to assist in getting it written.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top