Question

If I filter a sorted set or map based on a certain maximum or minimum, will Java 8 mindlessly check the condition on each element or does it employ an optimization using the "sortedness"? If not, is there a better way that still uses the Java 8 parallelism?

SortedSet<Integer> numbers = getNumbers();
numbers.parallelStream().filter(n -> n <= 100).forEach(...);

P.S.: To clarify as requested: Let's suppose "numbers" is very large. If we check for each element, if it is <= 100 (or any other number), we waste a big amount of time. An efficient implementation would binary search over the sorted set and identify the cutoff in log time instead of linear time and then use some internal set feature to create a subset at this cutoff.

Was it helpful?

Solution

Use the method headSet in combination with parallel streams. In your example, it looks as follows:

SortedSet<Integer> numbers = getNumbers();
numbers.headSet(100 + 1)
    .parallelStream()
    .filter(n -> n <= 100) // no longer required
    .forEach(...);

First the code does a binary search with logarithmic complexity. Afterwards, all elements smaller than 100 + 1 are handled in parallel.

Edit: Unfortunately, neither TreeSet nor ConcurrentSkipListSet supports parallel execution on subsets. The above code will work, but it will always be executed sequentially. This can be checked using the following code. I see no reason, why this could not be implemented. I guess nobody thought it might be important.

SortedSet<Integer> numbers = ...;
System.out.printf("Full Set:    %s\nPartial Set: %s\n",
    numbers.spliterator().trySplit(),
    numbers.headSet(1_000_000).spliterator().trySplit());

OTHER TIPS

Can you elaborate on what you mean by "certain maximum or minimum"? Your example code uses a hard-coded value of 100. Are you implying the 100 is supposed to be the result of maximum or minimum function applied to the SoretedRet?? Or are just asking how the filter is applied?

The filter is applied first. The result is all elements that meet the filter criteria (<= 100).

ForEach is the applied to the result of the filter result (I.E. not applied to any elements that did not meet the filter criteria).

SortedSet<Integer> numbers = new TreeSet<Integer>();

    for (int index = 0; index < 100; index++) {
        numbers.add(index);
    }

    Stream<Integer> filterResult = numbers.parallelStream().filter(n -> n <= 9);

    System.out.println(filterResult.count());

However, I am not sure what you are asking? Perhaps you can elaborate.

Update: See this page: http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html

Executing Streams in Parallel

You can execute streams in serial or in parallel. When a stream executes in parallel, the Java runtime partitions the stream into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.

The operation divides the collection in to parallel tasks and thus subsections of the set.

In addition, you can always peruse the source code and verify the implementation matches your expectation. The JSR is a specification right:) Not an implementation. Thus, technically to answer your question we need to look at a specific implementation. Just something to consider.

Hope that helps :) Good Luck

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top