Question

I'm trying to save RDD as compressed Sequence file. I'm able to save non-compressed file be calling:

counts.saveAsSequenceFile(output)

where counts is my RDD (IntWritable, Text). However, I didn't manage to compress output. I tried several configurations and always got exception:

counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.SnappyCodec])
<console>:21: error: type mismatch;
 found   : Class[org.apache.hadoop.io.compress.SnappyCodec](classOf[org.apache.hadoop.io.compress.SnappyCodec])
 required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]]
              counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.SnappyCodec])

 counts.saveAsSequenceFile(output, classOf[org.apache.spark.io.SnappyCompressionCodec])
<console>:21: error: type mismatch;
 found   : Class[org.apache.spark.io.SnappyCompressionCodec](classOf[org.apache.spark.io.SnappyCompressionCodec])
 required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]]
              counts.saveAsSequenceFile(output, classOf[org.apache.spark.io.SnappyCompressionCodec])

and it doesn't work even for Gzip:

 counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.GzipCodec])
<console>:21: error: type mismatch;
 found   : Class[org.apache.hadoop.io.compress.GzipCodec](classOf[org.apache.hadoop.io.compress.GzipCodec])
 required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]]
              counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.GzipCodec])

Could you please suggest solution? also, I didn't find how is it possible to specify compression parameters (i.e. compression type for Snappy)

Was it helpful?

Solution

The signature of saveAsSequenceFile is def saveAsSequenceFile(path: String, codec: Option[Class[_ <: CompressionCodec]] = None). You need to send a Option[Class[_ <: CompressionCodec]] as codec. E.g.,

counts.saveAsSequenceFile(output, Some(classOf[org.apache.hadoop.io.compress.SnappyCodec]))

If you read the error information of type mismatch carefully, you should have corrected it by yourself.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top