Question

I have a data structure made of Jobs each containing a set of Tasks. Both Job and Task data are defined in files like these:

jobs.txt:
JA
JB
JC

tasks.txt:
JB  T2
JA  T1
JC  T1
JA  T3
JA  T2
JB  T1 

The process of creating objects is the following:
- read each job, create it and store it by id
- read task, retrieve job by id, create task, store task in the job

Once the files are read this data structure is never modified. So I would like that tasks within jobs would be stored in an immutable set. But I don't know how to do it in an efficient way. (Note: the immutable map storing jobs may be left immutable)

Here is a simplified version of the code:

class Task(val id: String) 

class Job(val id: String) {
    val tasks = collection.mutable.Set[Task]() // This sholud be immutable
}

val jobs = collection.mutable.Map[String, Job]() // This is ok to be mutable

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = new Job(line.trim)
    jobs += (job.id -> job)
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = jobs(tokens(0).trim)
    val task = new Task(job.id + "." + tokens(1).trim)
    job.tasks += task
}

Thanks in advance for every suggestion!

Was it helpful?

Solution

The most efficient way to do this would be to read everything into mutable structures and then convert to immutable ones at the end, but this might require a lot of redundant coding for classes with a lot of fields. So instead, consider using the same pattern that the underlying collection uses: a job with a new task is a new job.

Here's an example that doesn't even bother reading the jobs list--it infers it from the task list. (This is an example that works under 2.7.x; recent versions of 2.8 use "Source.fromPath" instead of "Source.fromFile".)

object Example {
  class Task(val id: String) {
    override def toString = id
  }

  class Job(val id: String, val tasks: Set[Task]) {
    def this(id0: String, old: Option[Job], taskID: String) = {
      this(id0 , old.getOrElse(EmptyJob).tasks + new Task(taskID))
    }
    override def toString = id+" does "+tasks.toString
  }
  object EmptyJob extends Job("",Set.empty[Task]) { }

  def read(fname: String):Map[String,Job] = {
    val map = new scala.collection.mutable.HashMap[String,Job]()
    scala.io.Source.fromFile(fname).getLines.foreach(line => {
      line.split("\t") match {
        case Array(j,t) => {
          val jobID = j.trim
          val taskID = t.trim
          map += (jobID -> new Job(jobID,map.get(jobID),taskID))
        }
        case _ => /* Handle error? */
      }
    })
    new scala.collection.immutable.HashMap() ++ map
  }
}

scala> Example.read("tasks.txt")
res0: Map[String,Example.Job] = Map(JA -> JA does Set(T1, T3, T2), JB -> JB does Set(T2, T1), JC -> JC does Set(T1))

An alternate approach would read the job list (creating jobs as new Job(jobID,Set.empty[Task])), and then handle the error condition of when the task list contained an entry that wasn't in the job list. (You would still need to update the job list map every time you read in a new task.)

OTHER TIPS

I did a feel changes for it to run on Scala 2.8 (mostly, fromPath instead of fromFile, and () after getLines). It may be using a few Scala 2.8 features, most notably groupBy. Probably toSet as well, but that one is easy to adapt on 2.7.

I don't have the files to test it, but I changed this stuff from val to def, and the type signatures, at least, match.

class Task(val id: String)  
class Job(val id: String, val tasks: Set[Task])

// read tasks 
val tasks = (
  for {
    line <- io.Source.fromPath("tasks.txt").getLines().toStream
    tokens = line.split("\t") 
    jobId = tokens(0).trim
    task = new Task(jobId + "." + tokens(1).trim) 
  } yield jobId -> task
).groupBy(_._1).map { case (key, value) => key -> value.map(_._2).toSet }

// read jobs 
val jobs = Map() ++ (
  for {
    line <- io.Source.fromPath("jobs.txt").getLines()
    job = new Job(line.trim, tasks(line.trim))
  } yield job.id -> job
)

You could always delay the object creation until you have all the data read in from the file, like:

case class Task(id: String) 
case class Job(id: String, tasks: Set[Task])

import scala.collection.mutable.{Map,ListBuffer}
val jobIds = Map[String, ListBuffer[String]]()

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = line.trim
    jobIds += (job.id -> new ListBuffer[String]())
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = tokens(0).trim
    val task = job.id + "." + tokens(1).trim
    jobIds(job) += task
}

// create objects
val jobs = jobIds.map { j =>
    Job(j._1, Set() ++ j._2.map { Task(_) })
}

To deal with more fields, you could (with some effort) make a mutable version of your immutable classes, used for building. Then, convert as needed:

case class Task(id: String)
case class Job(val id: String, val tasks: Set[Task])
object Job {
    class MutableJob {
        var id: String = ""
        var tasks = collection.mutable.Set[Task]()
        def immutable = Job(id, Set() ++ tasks)
    }
    def mutable(id: String) = {
        val ret = new MutableJob
        ret.id = id
        ret
    }
}

val mutableJobs = collection.mutable.Map[String, Job.MutableJob]() 

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = Job.mutable(line.trim)
    jobs += (job.id -> job)
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = jobs(tokens(0).trim)
    val task = Task(job.id + "." + tokens(1).trim)
    job.tasks += task
}

val jobs = for ((k,v) <- mutableJobs) yield (k, v.immutable)

One option here is to have some mutable but transient configurer class along the lines of the MutableMap above but then pass this through in some immutable form to your actual class:

val jobs: immutable.Map[String, Job] = {
  val mJobs = readMutableJobs
  immutable.Map(mJobs.toSeq: _*)
}

Then of course you can implement readMutableJobs along the lines you have already coded

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top