문제

I'm having a hard time making a unit test for my scalding Job.

My Job expects a file with three fields:

  TextLine(args("input"))
    .map('url -> ('fetchedUrl,'date,'info)){
  ...

Naively I would've expected that the fields got mapped as a nTuple, without needing any further setup. But from my test I get that it's not like that and some further contract needs to be stablished:

JobTest[com.kohls.crawler.Miner]
  .arg("input", "inputFile")
  .arg("output", "outputFile")
  .source(TextLine("inputFile"), List(("https://en.wikipedia.org/wiki/Test" ,"Mon Apr 14 15:08:11 CDT 2014", "extra info")))
  .sink[(String,Date,Array[Byte])](Tsv("outputFile")){ ... }

This currently fails with cascading.tuple.FieldsResolverException: could not select fields: [{1}:'url'], from: [{2}:'offset', 'line']. So I guess that I need to declare the TSV fields in some kind of way before feeding it as TextLine's input.

Most documentation I've found is spotty in this regard. What is the correct why to define this test?

도움이 되었습니까?

해결책

You should use Tsv instead of TextLine. Tsv takes the declared fields as second input parameter. Your job would look like this:

Tsv(args("input"), ('fetchedUrl,'date,'info), skipHeader = false/true).read
  .map(...)
  .write(Tsv(args("output"), writeHeader = false/true)

And your job test like this:

JobTest[com.kohls.crawler.Miner]
  .arg("input", "inputFile")
  .arg("output", "outputFile")
  .source(Tsv("inputFile"), List(("https://en.wikipedia.org/wiki/Test" ,"Mon Apr 14 15:08:11 CDT 2014", "extra info")))
  .sink[(String,Date,Array[Byte])](Tsv("outputFile")) { ... }
  .run
  .finish

다른 팁

Of course you can mock a TextLine in your test. The trick is to supply the hidden 'line field.

    JobTest[com.kohls.crawler.Miner]
      .arg("input", "inputFile")
      .arg("output", "outputFile")
      .source(TextLine("inputFile"), List((
        0 -> "https://en.wikipedia.org/wiki/Test" , 
        1 -> "Mon Apr 14 15:08:11 CDT 2014", 
        2 -> "extra info")))
      .sink[(String,Date,Array[Byte])](Tsv("outputFile")){ ... }
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top