Question

Whats the neatest way of specifying the following concepts in scala ? The concept represent an interface to a generic extractor function that that has specifies the class that is being extracted at compile time.

Definition of Contruct

import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import scala.collection.JavaConversions._

trait ExtractorBase {
  type ExtractedType
  type RetType = Tuple2[Option[Seq[ExtractedType]],Option[Seq[String]]]
  type ExtractorFunction = (Document) => RetType
  def extractor : ExtractorFunction
}
class Extractor[T] (extractor_in: Any) extends ExtractorBase {
  type ExtractedType = T
  val extractor : ExtractorFunction = extractor_in
}

Instance of Contruct

//type TFunc = (Document) => Tuple2[Option[Seq[Int]],Option[Seq[String]]]
val ex = new Extractor[Int]( (x: Document)=> {
    (Some(Seq(1)),Some(Seq("hassan")))
})

At the moments the extractor function isn't aliasing correctly either :

[error] /Users/hassan/code/scala/extractor/hon.scala:14: type mismatch;
[error]  found   : Any
[error]  required: org.jsoup.nodes.Document => (Option[Seq[Extractor.this.ExtractedType]], Option[Seq[String]])
[error]     (which expands to)  org.jsoup.nodes.Document => (Option[Seq[T]], Option[Seq[String]])
[error]   val extractor : ExtractorFunction = extractor_in
  • How do I reference generic trait data outside of the definition of the classes extending the traits ? At compile time, and runtime. For example of compile time i'd rather have :

    trait ExtractorBase[T] { type ExtractedType = T type RetType = Tuple2[Option[Seq[ExtractedType]],Option[Seq[String]]] type ExtractorFunction = (Document) => RetType def extractor : ExtractorFunction }

and then

class Extractor[T] (extractor_in: ExtractorBase[T].ExtractorFunction) extends ExtractorBase

Kind of similar to C++ traits.

  • At runtime I wouldn't mind getting at the type information in the trait either.

  • there is probably a more idiomatic way of creating instances of the trait, I don't want to have to go through the Extractor[T] class. Is there a way to do it with a companion object of the trait ?

edit

Ankur's answer led me to the right path. I was missing the # operator from my repertoire. times to dress this up.

trait ExtractorBase[T] {
  type ExtractedType = T
  type RetType = Tuple2[Option[Seq[ExtractedType]],Option[Seq[String]]]
  type ExtractorFunction = (Document) => RetType
  def extractor : ExtractorFunction
}
class Extractor[T] (extractor_in: ExtractorBase[T]#ExtractorFunction) extends ExtractorBase[T] {
  def extractor : ExtractorBase[T]#ExtractorFunction = extractor_in
}

Leading to :

val ex = new Extractor[Int]( (x: Document)=> {
      (Some(Seq(1)),Some(Seq("hassan")))
})
Was it helpful?

Solution

A possible generic specification of a extraction function using jsoup that can retrieve generic url's and items (or both, or either, or none) could be defined as follows. It's functional, and idiomatic afaik. Please provide comments if it can be improved.

import org.jsoup.nodes.Document

object ExtractorTraits {
  case class UrlPair[T](data: String, payload: Option[T])
  case class ResultPair[T,U](items: Option[Seq[T]],urls:  Option[Seq[UrlPair[U]]] = None)

  trait ItemExtractorTrait[I,C] {
    type ExtractedType = I
    type RetType = Option[Seq[ExtractedType]]
    type ExtractorFunction = (Document,Option[C]) => RetType
    def apply : ExtractorFunction
  }
  class ItemExtractor[I,C](extract_item : ItemExtractorTrait[I,C]#ExtractorFunction = (doc : Document,c: Option[C]) => None)
    extends ItemExtractorTrait[I,C] { val apply = (doc: Document, ctxt: Option[C]) => extract_item(doc,ctxt) }

  trait UrlExtractorTrait[U,C] {
    type UrlPayload = U
    type RetType = Option[Seq[UrlPair[U]]]
    type ExtractorFunction = (Document,Option[C]) => RetType
    def apply : ExtractorFunction
  }
  class UrlExtractor[U,C](extract_url : UrlExtractor[U,C]#ExtractorFunction = (doc:Document,c : Option[C]) => None)
    extends UrlExtractorTrait[U,C] { val apply = (doc: Document, ctxt:  Option[C]) => extract_url(doc,ctxt)}

  trait ExtractorTrait[I,U,C] {
    type RetType = ResultPair[I,U]
    type ExtractorFunction = (Document, Option[C]) => RetType
    def apply : ExtractorFunction
  }
  class Extractor[I,U,C] (item_extractor: ItemExtractorTrait[I,C]#ExtractorFunction = new ItemExtractor[I,C]().apply,
                        url_extractor: UrlExtractorTrait[U,C]#ExtractorFunction = new UrlExtractor[U,C]().apply) extends ExtractorTrait[I,U,C] {
    val apply = (doc: Document, ctxt: Option[C]) => ResultPair[I,U](item_extractor(doc,ctxt),url_extractor(doc,ctxt))
  }
}

edit

Split the function traits up so they can be worked with separately.

edit2

extractors now have context

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top