Question

Could you push me into right direction by the following question? (Even link to the documentation containing the required info would be appreciated.)

Is there any ability to merge multiple streams of data into stream of tuples.

E.g. we have stream A with elements (A1, t1), (A2, t2), ...(An, tn) and stream B with elements (B1, t1'), (B2, t2'), ... , (Bn, tn').

Where t is time of value (values are time series actually).

I would like to receive stream C with values

(A1", B1", t1"), ...,(An", Bn", tn")

Time from streams A and B could differ (that's why I am using ' and "). Metrics could be consumed in different time and by different rate. In such case, value with the latest to required time stamp must be taken while merging streams.

Was it helpful?

Solution

You can use DStream.join. When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top