Question

The below method simply parses a CSV file which is ID,DATE,NUMBER.

Is there an implementation of this that is more functional -- particularly something that didn't use a mutable collection, and returned an immutable collection? All of the code I have after this doesn't require mutability in the data structures.

I obviously don't want to make the code any more inefficient or ugly than it already is -- but I'd be keen to know if a more functional programming capable engineer would have done it differently?

def parseFile() : HashMap[String,ListBuffer[String]] = {

   val userDataSet = scala.collection.mutable.HashMap.empty[String,ListBuffer[String]]

   for ( ln <- io.Source.stdin.getLines ) {

      val cols = ln.split(",")
      var values : ListBuffer[String] = userDataSet.getOrElse( cols(0), null )
      if ( values == null ) {
         values = ListBuffer.empty[String]
         userDataSet( cols( 0 ) ) = values
      }

      values += cols(2)
   }

   HashMap[String,ListBuffer[String]]() ++ userDataSet
}
Was it helpful?

Solution

First, let's fix up what you've got already:

def parseFile(): Map[String, ListBuffer[String]] = { // always use interfaces (Map, not HashMap)
  val userDataSet = scala.collection.mutable.HashMap.empty[String, ListBuffer[String]]
  for (ln <- io.Source.stdin.getLines) {
    val Array(id, date, number) = ln.split(",")  // pattern match with names for clarity
    val values = userDataSet.getOrElseUpdate(id, ListBuffer.empty)
    values += number
  }
  userDataSet.toMap
}

Note that even if getOrElseUpdate wasn't there, you'd still want to avoid the return-null-and-check thing by simply doing !userDataSet.contains(id). Alternatively, you could use .get(id), which returns an Option.

Now for the functional solution. When you want to build up a collection by traversing some other collection, you probably want a "fold":

def parseFile(): Map[String, Vector[String]] = {
  val lines = io.Source.stdin.getLines
  lines.foldLeft(Map.empty[String, Vector[String]]) { (userDataSet, ln) =>
    val Array(id, date, number) = ln.split(",")  // pattern match with names for clarity
    val existingValues = userDataSet.getOrElse(id, Vector.empty)
    val updatedValues = existingValues :+ number
    userDataSet + (id -> updatedValues)         // update the Map with the new key/value
  }
}

OTHER TIPS

If you want to be "super-functional" you can do it with scalaz-stream (https://github.com/scalaz/scalaz-stream).

Full gist is here: https://gist.github.com/ezhulenev/9966059.

To understand runFoldMap you also need to take a look at Monoid concept: http://eed3si9n.com/learning-scalaz/Monoid.html. (Map is monoid, Vector is monoid, Map of Vectors is monoid too)

val csv =
"""|0,2014-01-01,1
   |0,2014-01-02,2
   |1,2014-01-01,3
   |1,2014-01-01,4""".stripMargin

val is = new ByteArrayInputStream(csv.getBytes)

val process = io.linesR(is).
  map(_.split(",")).               // split to columns
  map(arr => (arr(0), arr(2))).    // pick id & number columns
  runFoldMap { case (id, number) => Map(id -> Vector(number)) }

val output: Map[String, Vector[String]] = process.run
println(s"Output: ")
output.foreach(println)

Output:

(1,Vector(3, 4))
(0,Vector(1, 2))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top