Question

Assume

  1. You have some POJO Animal like public class Animal { // some fields }
  2. You have some enum AnimalType like public enum AnimalType { // some animal types }

  3. You have some method that given an Animal returns an AnimalType with signature public AnimalType getAnimalType(Animal animal);

  4. Finally, you have some infinite stream of animals Stream<Animal>

You want to collect into some structure the first of each occurrence of each animal type. Given that you're iterating over an infinite stream, you also want to return the first moment you've satisfied the structure requirements.

How would you solve this / could you solve this without using state outside of the stream? This is not a homework question. I came across something similar and wasn't able to come up with a stream-only (no outside stream state) solution.

Was it helpful?

Solution

could you solve this without using state outside of the stream

No, you can't. But that doesn't matter. Let me show you why:

Assume

  1. You have some POJO Animal like public class Animal { // some fields }

Since requirement 3 insists on being able to get the AnimalType enum from Animal it becomes a type field. Adding a tag field to distinguish instances when displaying them.

class Animal {

    public Animal( String tag, AnimalType type ) { 
        this.tag = tag;
        this.type = type;
    }

    public AnimalType getAnimalType() { return type; }

    public String toString() { return tag + " " + type + " animal"; }

    @Override
    public int hashCode() { return Objects.hashCode( type ); }

    @Override
    public boolean equals( Object that ) {        
        return that != null
            && that.getClass() == this.getClass()     
            && ( (Animal) that ).type == this.type
        ;
    }

    private String tag;
    private AnimalType type;
}

Note that an animals identity is tied to type not tag.

See full listing.

  1. You have some enum AnimalType like public enum AnimalType { // some animal types }
enum AnimalType{ HUMAN, DOG, CAT }
  1. You have some method that given an Animal returns an AnimalType with signature public AnimalType getAnimalType(Animal animal);

Don't really need this but OK.

public AnimalType getAnimalType(Animal animal) { return animal.getAnimalType(); }
  1. Finally, you have some infinite stream of animals Stream<Animal>

This gives an infinite stream by giving these four over and over. The second human animal should be removed since it is not "the first of each occurrence of each animal type".

List<Animal> animals = Arrays
    .asList(
        new Animal( "first", AnimalType.HUMAN ), 
        new Animal( "first", AnimalType.DOG ), 
        new Animal( "second", AnimalType.HUMAN ), 
        new Animal( "first", AnimalType.CAT ) 
    )
;

Stream<Integer> infiniteStreamOfInts = Stream.iterate( 0, i->i+1 );

Stream<Animal> infiniteStreamOfAnimals = infiniteStreamOfInts
    .map( 
        i->animals.get( 
            i % animals.size() 
        ) 
    )
;

You want to collect into some structure the first of each occurrence of each animal type. Given that you're iterating over an infinite stream, you also want to return the first moment you've satisfied the structure requirements.

This means that you need to be able to test the structure to see when you have each and every animal type.

How would you solve this / could you solve this without using state outside of the stream? This is not a homework question. I came across something similar and wasn't able to come up with a stream-only (no outside stream state) solution.

Well you can't. The stream doesn't know what's gone before. But without "outside stream state" you can't collect anyway.

@CandiedOrange For example, when I map over a list, I can collect the results into a list via .collect(Collectors.toList()). This allows me to refrain from declaring a list outside of the scope of the stream and performing a forEach modifying that list – geofflittle

This is not "inside stream state". This is just providing a generator to the stream so it can request a new collection to collect elements into. Declaring a collection outside and passing it here only has one effect: it allows you to reference the collection before it's returned to you. That's it. And that is something you absolutely need if you're going to stop based on collection state.

(Note: takeWhile() is Java 9. Let me know if you need to stick to 8)

Set<Animal> setOfAnimals = new LinkedHashSet<>();

infiniteStreamOfAnimals
    .takeWhile( x->setOfAnimals.size() < AnimalType.values().length )
    .collect(
        Collectors.toCollection(
            ()->setOfAnimals
        )
    )
;

System.out.println( setOfAnimals );

Displays:

[first HUMAN animal, first DOG animal, first CAT animal]

Instead you're trying to avoid having access by coding the collection generator like this:

.collect(
    Collectors.toCollection(
        LinkedHashSet::new
    )
)

Do it this way and when you try to stop the infinite stream you'll have no access to the structure (collection) you need to test. Even if you could do this, the new hashset is also "outside stream state". The stream really doesn't care what it's talking to, so long as you don't mess with it. All you've done is hide the reference to the collection from yourself. To test you need that reference.

Maybe you've got some concurrency concerns here. Let me assure you they're baseless. Because this stream can't be parallelized anyway, due to your sequential requirement of storing the first occurence of each type.

Maybe you just want to be functional. You want to create something pure that is both deterministic and referentially transparent. Well that we can do. Just shove it in a function:

class InfiniteStream {
    public static <T> Set<T> firstFrom( Stream<T> infiniteStream, int size ) {
        Set<T> set = new LinkedHashSet<>();

        infiniteStream
            .takeWhile( x->set.size() < size )
            .collect(
                Collectors.toCollection(
                    ()->set
                )
            )
        ;

        return set;
    }
}

when you call it like this:

Set<Animal> setOfAnimals = InfiniteStream
    .firstFrom( 
        infiniteStreamOfAnimals, 
        AnimalType.values().length 
    )
;

hey presto the call is deterministic and referentially transparent. No "outside" state to worry about. What more could you want?

For analysis of this code and an alternative approach see this Code Review question of mine

Licensed under: CC-BY-SA with attribution
scroll top