Running a function against every item in collection

Question 1

@RexKerr pointed to me that I was somewhat incorrect in the comment section, so I deleted my comments. But while doing that, I had the chance to read the post again and came up with the idea that might be of some use to you.

Since what you are trying to implement is actually some operation over a cartesian product, you might want to try just calling the RDD#cartesian. Here is a dumb example, but if you can give some real code, maybe I'll be able to do something like this in that case as well:

// get collection with the type corresponding to the type in question:
val v1 = sc.parallelize(List("q"-> (".", 0), "s"->(".", 1), "f" -> (".", 2))).groupByKey
// try doing something
v1.cartesian(v1).map{x => (x._1._1+","+x._1._1, 2)}.foreach(println)

Question 2

Short answer:

counted.cartesian(counted).map {
  case ((x, _), (y, _)) => (x + "," + y, func)
}

Please use pattern matching to extract tuple elements for nested tuples to avoid unreadable chained underscore notation. Using _ for the second elements shows the reader that these values are being ignored.

Now what would be even more readable (and maybe more efficient) if func doesn't use the second elements would be to do this:

val projected = counted.map(_._1)
projected.cartesian(projected).map(x => (x._1 + "," + x._2, func))

Note that you do not need curly braces if your lambda fits in a single semantic line this is a very common mistake in Scala.

I would like to know why you wish to have this Cartesian product, there is often ways to avoid doing this that are significantly more scalable. Please say what your going to do with this Cartesian product and I will try to find a scalable way of doing what you want.

One final point; please put spaces between operators