I'm having a hard time making a unit test for my scalding Job.

My Job expects a file with three fields:

  TextLine(args("input"))
    .map('url -> ('fetchedUrl,'date,'info)){
  ...

Naively I would've expected that the fields got mapped as a nTuple, without needing any further setup. But from my test I get that it's not like that and some further contract needs to be stablished:

JobTest[com.kohls.crawler.Miner]
  .arg("input", "inputFile")
  .arg("output", "outputFile")
  .source(TextLine("inputFile"), List(("https://en.wikipedia.org/wiki/Test" ,"Mon Apr 14 15:08:11 CDT 2014", "extra info")))
  .sink[(String,Date,Array[Byte])](Tsv("outputFile")){ ... }

This currently fails with cascading.tuple.FieldsResolverException: could not select fields: [{1}:'url'], from: [{2}:'offset', 'line']. So I guess that I need to declare the TSV fields in some kind of way before feeding it as TextLine's input.

Most documentation I've found is spotty in this regard. What is the correct why to define this test?

有帮助吗?

解决方案

You should use Tsv instead of TextLine. Tsv takes the declared fields as second input parameter. Your job would look like this:

Tsv(args("input"), ('fetchedUrl,'date,'info), skipHeader = false/true).read
  .map(...)
  .write(Tsv(args("output"), writeHeader = false/true)

And your job test like this:

JobTest[com.kohls.crawler.Miner]
  .arg("input", "inputFile")
  .arg("output", "outputFile")
  .source(Tsv("inputFile"), List(("https://en.wikipedia.org/wiki/Test" ,"Mon Apr 14 15:08:11 CDT 2014", "extra info")))
  .sink[(String,Date,Array[Byte])](Tsv("outputFile")) { ... }
  .run
  .finish

其他提示

Of course you can mock a TextLine in your test. The trick is to supply the hidden 'line field.

    JobTest[com.kohls.crawler.Miner]
      .arg("input", "inputFile")
      .arg("output", "outputFile")
      .source(TextLine("inputFile"), List((
        0 -> "https://en.wikipedia.org/wiki/Test" , 
        1 -> "Mon Apr 14 15:08:11 CDT 2014", 
        2 -> "extra info")))
      .sink[(String,Date,Array[Byte])](Tsv("outputFile")){ ... }
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top