REPL returns RDD values but SBT won't compile

https://stackoverflow.com//questions/25002088

20-12-2019
|

Question

When running the below method from a fresh spark shell REPL session everything works fine. However when I try to compile the class containing this method I get the following errors

Error:(21, 50) value values is not a member of org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.recommendation.Rating)]
val training = ratings.filter(x => x._1 < 6).values.repartition(numPartitions).persist
                                             ^
Error:(22, 65) value values is not a member of org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.recommendation.Rating)]
val validation = ratings.filter(x => x._1 >= 6 && x._1 < 8).values.repartition(numPartitions).persist
                                                            ^
Error:(23, 47) value values is not a member of org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.recommendation.Rating)]
val test = ratings.filter(x => x._1 >= 8).values.persist
                                          ^

In both cases I'm using Spark 1.0.1 The code itself is as follows.

def createDataset(ratings: RDD[Tuple2[Long,Rating]]): List[RDD[Rating]] = {

    val training = ratings.filter(x => x._1 < 6).values.repartition(numPartitions).persist
    val validation = ratings.filter(x => x._1 >= 6 && x._1 < 8).values.repartition(numPartitions).persist
    val test = ratings.filter(x => x._1 >= 8).values.persist
    val numTraining = training.count
    val numValidation = validation.count
    val numTest = test.count

    println(" Number Of Training ::: " + numTraining + " numValidation ::: " + numValidation + " ::: " + numTest)
    List(training,validation,test)
  }

It is taken from the MLLib tutorial (Adapted slightly) , no idea whats going wrong.

Solution

you need to have this line in your code

import org.apache.spark.SparkContext._

This imports the implicit conversions to PairRDDFunctions which is what allows you to call values. The spark REPL imports this for you which is why you don't see the error in the interpreter. Specifically this function in SparkContext does the conversion.

implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
      (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null) = {
    new PairRDDFunctions(rdd)
  }

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow