Question

I'm getting the hang out of Scalding I require to fetch a number of URLs from the internet.

As it seems, Scala doesn't provide a single class to do the HTTP request in its standard library.

As many of the bare java solutions I've seen seem too verbose I was wondering if I could just use Scalding Pipe machinery to do just this or if this not a the kind of tasks it's intended to be used for.

Also. In the case of using an external library such as Dispatch or scalaj-http: Could I fetch the result to a Pipe directly or there is more plumbing involved?

Was it helpful?

Solution

I'm not sure if it makes sense to fetch URLs directly during Map/Reduce. I'd rather fetch the URLs using some other mechanism (e.g. Akka) and store the content in HDFS (via Kafka for example), and then run Scalding jobs on top of that data.

You can use Playframework's WS library (now available as a stand-alone module) for URL fetching. For more info see the documentation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top