Splitting an array at random in Scala
Pergunta
I'm building a decision tree system in Scala, but some of the entries in my data have identical attributes. I've gotten around this by implementing a "random" node type, allowing the query to randomly select which branch to traverse, but I'm getting a "MatchError" when trying to split the remaining examples at random. My current code:
def splitRandom(examples: Array[String]): Array[String]={
examples.collect {case x if(r.nextInt(100) < 50) => x}
}
"examples" is an array of strings, with each string being a line containing a single data entry with all of its attributes.
Solução
collect
isn't a good choice for random behavior because the same condition can be evaluated twice (first on an isDefinedAt
, and then a second time to compute the value); if it says true the first time and false the second--on the same input--match will be upset. Use filter
instead:
examples.filter(_ => r.nextInt(100) < 50)
Outras dicas
there is a solution fits your issue:
import util.Random
val shuffled = Random.shuffle(your_array)
val (first, second) = shuffled.splitAt(your_position)
I found this trick when I wanted a rdd.randomSplit's counterpart for Scala List or Array
You can do some type transformation if needed