I'm not sure if it makes sense to fetch URLs directly during Map/Reduce. I'd rather fetch the URLs using some other mechanism (e.g. Akka) and store the content in HDFS (via Kafka for example), and then run Scalding jobs on top of that data.
You can use Playframework's WS library (now available as a stand-alone module) for URL fetching. For more info see the documentation.