Question

According to the documentation here, this feature is experimental but I would like to know if anyone is using it successfully. I already have some data so I am trying use case 4.
I tried to run an update hive query with @Incremental annotation but with it nothing goes into my RDB anymore.
If I remove it, everything is working but I want to take an advantage of this feature, because of the large amount of stored data and the query execution going very slow cause of it.
Any suggestion or help is greatly appreciated.

Was it helpful?

Solution

The incremental analysis feature will be working fine in the partially distributed setup, but it wasn't thoroughly tested in the external hadoop cluster, hence it was marked as 'experimenal'. Anyhow if you find any bugs on these you can report it in jira.

To answer your question, you need to enable the incremental processing for your stream first and then you need to add the incremental annotation.The following are the detailed steps for this.

1) You need add property 'streams.definitions.defn1.enableIncrementalIndex=true' in the streams.properties as explained here file and create a toolbox which consists only the stream definition artefact as explained here.

2) Install the toolbox - This will register the stream definition you mentioned in the toolbox with incremental analysis. On this point on wards the incoming data will be incrementally processed.

3) Now indicate the @Incremental annotation in the query. The first iteration will consider the whole available data as you have enabled the incremental analysis in the middle of the processing, but from next iteration onwards it'll only consider the new bunch of data.

OTHER TIPS

This feature is said as experimental as there may be some critical bugs. We will release a more stable version of BAM with this feature in the next release.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top