Question

I have an ETL process that I am implementing using Pentaho Kettle (Spoon). Everything is working fine, except that I can't insert the generated data into my ElasticSearch remote server. I tried using Kettle's component "Elastic Search Bulk Insert", but Kettle can't find my Elastic Search nodes (as it can be seen here) . Is there any reliable way to add a lot of information to my ES server? Solutions with kettle or independent scripts/plugins/etc are accepted, the only constraint is that The ETL process will run in a different machine from Elastic Search. Kettle has a custom Java script element that could also be used.

EDIT: I found out that Pentaho is using a very old version of elastic search (0.16.3), I am trying to find a way to update it. No luck until now...

Was it helpful?

Solution

elasticsearch is a RESTful search engine so i use the REST Client kettle step. All you have to do is to follow the rest standarts for insertion rows into your remote elasticsearch server. it works well.

OTHER TIPS

I changed the dependent jar from elasticsearch-0.16.3.jar to elasticsearch-1.6.0.jar (it also needs lucene-core-4.10.4.jar), copied 'ElasticSearchBulk' (with some help) as a new plugin or modify the source code, because some of the locations of the elasticsearch package have changed (removing the wrong package import, then adding the correct). Finally, it is working well with elasticsearch1.6.

First you should know your Elastic Search Server configuration. Open elasticsearch.yml file under your Elasticsearch server and copy IP Address, transport.tcp.port and cluster.name values.

Come back to your Kettle, open "ElasticSearch Bulk Insert" task. Add "culster.name" in the [Settings] tab, and IP addres and tcp.port in [Servers] tab. Then try "Test Connection". it should works.

One common mistake in this context is copying elasticsearch-6.4.2.jar to \data-integration\lib. This is unnecessary and counterproductive.

Steps:

  1. Servers: localhost 9300
  2. Settings: cluster.name my_cluster_name // from elasticsearch.yml

  3. PDI 8.2 or 8.3 or 9.0

  4. Elasticsearch ver 6.4.2

Current PDI(6.0.1) release support elasticsearch 1.5.4,

if someone needs to latest elasticsearch 2.2 working plugin for PDI 6.*

U can download the it, I tested it working with 2.2

https://drive.google.com/file/d/0B0hgGtBdLOBMbWtfVVFnTE1uVmM/view?usp=sharing

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top