Question

I have a requirement to query a column in Hive and based on the output I want to query it in MySQL DB.

The flow is something like below:

  1. Query Hive table, get the list of IDs
  2. Use the list of ID's to query MySQL table. The query would be like select a,b,c from x where id in (list from hive)

How do I do this?

Was it helpful?

Solution

You just need to make your query using a tHiveInput component to retrieve the Id's and then you've got two options, the easy way or the hard (and better) way:

For the easy way you can then run from the tHiveInput component to a tFlowToIterate and then iterate out to a tMySqlInput component. Your query in the tMySqlInput component would then look something like:

"SELECT a, b, c
FROM x
WHERE id = '" + ((Integer)globalMap.get("row1.id")) + "'"

row1 here represents the row running into your tFlowToIterate. You can also get this variable by hitting ctrl+space and selecting the tFlowToIterate_x.id option.

You could then connect a tBufferOutput component to your tMySqlInput component to collect all of the iterations and then read it back in with a tBufferInput component for further processing.

This is obviously a simple concatenation that is then not the best way for efficiency and of course open to SQL injection. But it's a rough and ready way.

Your other (harder but potentially better) option is to use a parameterised query using a tMySqlRow component (not currently supported in the tMySqlInput component as of Talend 5.4).

This answer goes into some detail about parameterised queries in Talend as part of a solution to another problem (upserting efficiently). As mentioned in that answer, there's more detail on parameterised queries on Gabriele Baldassarre's blog.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top