Compiling large data structures in Haskell

https://stackoverflow.com/questions/19108984

29-06-2022
|

Question

I have a CSV file with stock trading history, its size is 70 megabytes. I want to run my program on it, but do not want to wait for 30 seconds every start.

1. Just translate CSV file into Haskell source file like this:

From                       | TO
-------------------------------------------
1380567537,122.166,2.30243 | history = [
...                        |       (1380567537,122.166,2.30243)
...                        |     , ...
...                        |     ]

2. Use Template Haskell to parse file compile-time.

Trying first approach I found my GHC eat up 12gb of memory after 3 hours of trying to compile one list (70 mb source code).

So is TH the single available approach? Or I can just use hard-coded large data structure in source file? And why GHC can't compile file? Does it go to combinatorial explosion because of complex optimizations or something?

Solution

Hard-coding so much data is not a common use-case, so it isn't surprising the compiler doesn't handle it well.

A better solution would be to put the data into some format that is easier to read than CSV. For example, consider writing a program that parses your CSV file and serializes the resulting structure using some package like cereal. Then your main program can read the binary file, which should be much faster than your CSV file.

This approach has the added benefit that running your program on new data will be easier and won't require recompiling.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow