You should use Tsv
instead of TextLine
. Tsv
takes the declared fields as second input parameter. Your job would look like this:
Tsv(args("input"), ('fetchedUrl,'date,'info), skipHeader = false/true).read
.map(...)
.write(Tsv(args("output"), writeHeader = false/true)
And your job test like this:
JobTest[com.kohls.crawler.Miner]
.arg("input", "inputFile")
.arg("output", "outputFile")
.source(Tsv("inputFile"), List(("https://en.wikipedia.org/wiki/Test" ,"Mon Apr 14 15:08:11 CDT 2014", "extra info")))
.sink[(String,Date,Array[Byte])](Tsv("outputFile")) { ... }
.run
.finish