Вопрос

Hello I am looking for fastest bat rather hi-level way to work with large data collection. My task consist of two task read alot of large files in memory and then make some statistical calculations (the easiest way to work with data in this task is random access array ).

My first approach was to use java.io.ByteArrayOutputStream, becuase it can resize it's internal storage .

def packTo(buf:java.io.ByteArrayOutputStream,f:File) = {
  try {
    val fs = new java.io.FileInputStream(f)
    IOUtils.copy(fs,buf)
  } catch  {
    case e:java.io.FileNotFoundException =>
  }
}

    val buf = new java.io.ByteArrayOutputStream()
    files foreach { f:File  => packTo(buf,f) } 
    println(buf.size())

    for(i <- 0 to buf.size()) {
       for(j <- 0 to buf.size()) {
          for(k <- 0 to buf.size()) {
       //  println("i  " + i + "  " + buf[i] );
                   // Calculate something amathing using buf[i] buf[j] buf[k] 
          }
       }
    }

    println("amazing = " + ???)

but ByteArrayOutputStream can't get me as byte[] only copy of it. But I can not allow to have 2 copies of data .

Это было полезно?

Решение

Have you tried scala-io? Should be as simple as Resource.fromFile(f).byteArray with it.

Другие советы

Scala's built in library already provides a nice API to do this

io.Source.fromFile("/file/path").mkString.getBytes

However, it's not often a good idea to load whole file as byte array into memory. Do make sure the largest possible file can still fit into your JVM memory properly.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top