Question

I have a lazy list of filenames created by find. I'd like to be able to load the metadata of these files lazily too. That means, that if i take 10 elements from metadata, it should only search the metadata of these ten files. The fact is find perfectly gives you 10 files if you ask for them without hanging your disk, whereas my script searches the metadata of all files.

main = do
    files <- find always always / 
    metadata <- loadMetaList files

loadMetaList :: [String] -> IO [Metadata]
loadMetaList file:files = do
    first <- loadMeta file
    rest <- loadMetaList files
    return (first:rest)

loadMeta :: String -> IO Metadata

As you can see, loadMetaList is not lazy. For it to be lazy, it should use tail recursion. Something like return (first:loadMetaList rest).

How do I make loadMetaList lazy?

Was it helpful?

Solution

The (>>=) of the IO monad is such that in

loadMetaList :: [String] -> IO [Metadata]
loadMetaList file:files = do
    first <- loadMeta file
    rest <- loadMetaList files
    return (first:rest)

the action loadMetaList files has to be run before return (first:rest) can be executed.

You can avoid that by deferring the execution of loadMetaList files,

import System.IO.Unsafe

loadMetaList :: [String] -> IO [Metadata]
loadMetaList file:files = do
    first <- loadMeta file
    rest <- unsafeInterleaveIO $ loadMetaList files
    return (first:rest)

with unsafeInterleaveIO (which find also uses). That way, the loadMetaList files is not executed until its result is needed, and if you require only the metadata of 10 files, only that will be loaded.

It's not quite as unsafe as its cousin unsafePerformIO, but should be handled with care too.

OTHER TIPS

Here's how you do it the pipes way. I don't really know how you implement loadMeta and find, so I just made something up:

import Pipes

find :: Producer FilePath IO ()
find = each ["heavy.mp3", "metal.mp3"]

type MetaData = String

loadMeta :: String -> IO MetaData
loadMeta file = return $ "This song is " ++ takeWhile (/= '.') file

loadMetaList :: Pipe FilePath MetaData IO r
loadMetaList = mapM loadMeta

To run it, we just compose processing stages like a pipeline and run the pipeline using runEffect:

>>> runEffect $ find >-> loadMetaList >-> stdoutLn
This song is heavy
This song is metal

There are a couple of key things to point out:

  • You can make find a Producer so that it only searches the directory tree lazily, too. I know you don't need this feature because your file set is small now, but it's very easy to include later when your directory gets larger.

  • It's lazy, but without unsafeInterleaveIO. It generates each output immediately and doesn't wait to first collect the whole list of results.

For example, it will work even if we use an infinite list of files:

>>> import qualified Pipes.Prelude as Pipes
>>> runEffect $ each (cycle ["heavy.mp3", "metal.mp3"]) >-> loadMetaList >-> Pipes.stdoutLn
This song is heavy
This song is metal
This song is heavy
This song is metal
This song is heavy
This song is metal
...
  • It will only compute as much as necessary. If we specify we only want three results, it will do the minimum amount of loading necessary to return two results, even if we provide an infinite list of files.

For example, we can cap the number of results using take:

>>> runEffect $ each (cycle ["heavy.mp3", "metal.mp3"]) >-> loadMetaList >-> Pipes.take 3 >-> Pipes.stdoutLn
This song is heavy
This song is metal
This song is heavy

So you asked what is wrong with unsafeInterleaveIO. The main limitation of unsafeInterleaveIO is that you cannot guarantee when the IO actions actually occur, which leads to the following common pitfalls:

  • Handles accidentally being closed before the file is read

  • IO actions occurring late or never

  • Pure code having side effects and throwing IOExceptions

The biggest advantages of Haskell's IO system over other languages is that Haskell completely decouples the evaluation model from the order of side effects. When you use lazy IO, you lose that decoupling and then the order of side effects becomes tightly integrated with Haskell's evaluation model, which is a huge step backwards.

This is why it is generally not wise to use lazy IO, especially now that there are easy and elegant alternatives.

If you want to learn more about how to use pipes to implement lazy IO safely, then you can read the extensive pipes tutorial.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top