Question

I would like to write a MERGE statement in Vertica database. I know it can't be used directly, and insert/update has to be combined to get the desired effect.

The merge sentence looks like this:

MERGE INTO table c USING (select b.field1,field2 aeg from table  a, table  b
where a.field3='Y' 
and a.field4=b.field4
group by  b.field1) t
  on (c.field1=t.field1)
    WHEN MATCHED THEN
      UPDATE 
        set c.UUS_NAIT=t.field2;

Would just like to see an example of MERGE being used as insert/update.

Was it helpful?

Solution

You really don't want to do an update in Vertica. Inserting is fine. Selects are fine. But I would highly recommend staying away from anything that updates or deletes.

The system is optimized for reading large amounts of data and for inserting large amounts of data. So since you want to do an operation that does 1 of the 2 I would advise against it.

As you stated, you can break apart the statement into an insert and an update.

What I would recommend, not knowing the details of what you want to do so this is subject to change:

1) Insert data from an outside source into a staging table.
2) Perform and INSERT-SELECT from that table into the table you desire using the criteria you are thinking about. Either using a join or in two statements with subqueries to the table you want to test against.
3) Truncate the staging table.

It seems convoluted I guess, but you really don't want to do UPDATE's. And if you think that is a hassle, please remember that what causes the hassle is what gives you your gains on SELECT statements.

OTHER TIPS

If you want an example of a MERGE statement follow the link. That is the link to the Vertica documentation. Remember to follow the instructions clearly. You cannot write a Merge with WHEN NOT MATCHED followed and WHEN MATCHED. It has to follow the sequence as given in the usage description in the documentation (which is the other way round). But you can choose to omit one completely.

I'm not sure, if you are aware of the fact that in Vertica, data which is updated or deleted is not really removed from the table, but just marked as 'deleted'. This sort of data can be manually removed by running: SELECT PURGE_TABLE('schemaName.tableName');

You might need super user permissions to do that on that schema. More about this can be read here: Vertica Documentation; Purge Data. An example of this from Vertica's Website: Update and Insert Simultaneously using MERGE

I agree that Merge is supported in Vertica version 6.0. But if Vertica's AHM or epoch management settings are set to save a lot of history (deleted) data, it will slow down your updates. The update speeds might go from what is bad, to worse, to horrible.

What I generally do to get rid of deleted (old) data is run the purge on the table after updating the table. This has helped maintain the speed of the updates. Merge is useful where you definitely need to run updates. Especially incremental daily updates which might update millions of rows.

Getting to your answer: I don't think Vertica supportes Subquery in Merge. You would get the following.

ERROR 0:  Subquery in MERGE is not supported

When I had a similar use-case, I created a view using the sub-query and merged into the destination table using the newly created view as my source table. That should let you keep using MERGE operations in Vertica and regular PURGEs should let you keep your updates fast.

In fact merge also helps avoid duplicate entries during inserts or updates if you use the correct combination of fields in ON clause, which should ideally be a join on the primary keys.

I like geoff's answer in general. It seems counterintuitive, but you'll have better results creating a new table with the rows you want in it versus modifying an existing one.

That said, doing so would only be worth it once the table gets past a certain size, or past a certain number of UPDATEs. If you're talking about a table <1mil rows, I might chance it and do the updates in place, and then purge to get rid of tombstoned rows.

To be clear, Vertica is not well suited for single row updates but large bulk updates are much less of an issue. I would not recommend re-creating the entire table, I would look into strategies around recreating partitions or bulk updates from staging tables.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top