Question

This is probably a pretty obscure case, but please bear with me.

I'm writing a web crawler that needs to download webpages and save them to local disk. Meanwhile, being a crawler, it needs to find the hyperlinks on each page and download those linked pages as well.

Right now I'm looking at the first sample code given on this page. I'm trying to write a conduit of this type:

processLink :: Conduit S.ByteString IO S.ByteString

And then I will replace this line in the sample code:

responseBody res $$+- sinkFile "foo.txt"

with this line:

responseBody res $= processLinks $$+ sinkFile

So what I'm trying to do is to implement a conduit called processLink, which processes a ByteString, extracks links from them, and then pass the original ByteString along to sinkFile so that the page can be saved. Then I can call functions on those links recursively to download those linked pages.

However, since processLink is a pure function, it seems like I can't do any IO operations inside processLink, and therefore cannot do the things I need like downloading those linked pages. Or can I? What could I do to get around this problem?

Was it helpful?

Solution

Alright I figured it out.... I think what I need is just liftIO lol. Sorry I'm kinda new to Haskell :P

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top