Database replication and consistency check

https://dba.stackexchange.com/questions/2967

16-10-2019
|

Pergunta

I got a question about two databases (in Oracle 10g) that I have, let's call them A and B. A have some information (in various tables) and I want B to get a partial copy of some tables from A, and constantly check changes in A and 'sync' them in B.

I want to ask you about some method, technique or maybe ideas, knowing that I can't make any change in A (just selects, no triggers).

I thank you help and patience (for possible edits) in advance.

Additional information:

Thanks for the answers, I don't know if it's relevant, but I found the MINUS operator, though I'm not sure if it works with a "sub table" (select).

Solução

Your options are rather limited because of your requirements to "constantly check ... sync" and "can't make any change in A". Things such as materialized view logs, dbms_alert, streams, and a standby database are all off the table.

If the tables in A are constantly having all of their rows updated then (as Jack Douglas said) a materialized view would be the easiest to setup. In the more likely event that most of the records don't change in A from moment to moment, you will probably want to setup a package (or packages) on B that select from A to merge and delete as necessary on B. This will only be as up to date as the frequency in which it is run, but given your requirements it may be the best you can do.

Specifically, your package should do the following:

Delete from B rows that do not exist in A.
Merge A into B updating when matched and inserting when not matched.

If you want to avoid hitting the table in A multiple times you could insert the entirety of the table into a global temporary table on B and then do your Delete/Merge from there.

Concerning Minus: Minus can tell you all the rows from a query of A that are not in B. By union-ing this with B minus query of A you can get all rows that are different, but this would probably take longer to process even before adding the insert/update part. If A doesn't get updates or deletes then you could insert the results of the first minus, but an insert into B...where not exists A... would still be faster and simpler.

Outras dicas

There is no way of knowing that a table in A has changed except by polling. You could consider Materialized Views, refreshing periodically, which can work over a dblink - but only a complete refresh is possible so this may only be practical if the tables are small.

Tough one, given that you have no access beyond a SELECT in db_A. So here's a thought, but it requires some pretty strict assumptions that may (or may not) be met:

Requirements:

All tables being sync'd have either:
- A timestamp (the more resolution the better)
- A unique, sequential ID
All table rows, once sync'd, don't change.
- Alternatively, if a change does occur AND updates a timestamp on the record, you might be able to work it out that way.

Now, on db_B:

CREATE TABLE table1...
CREATE TABLE table2...,
etc.

PROCEDURE SYNC_TABLE1 IS
    MAX_ACTIVITY_DATE DATE;
    MAX_SEQUENCE_NO NUMBER;

    BEGIN
        SELECT MAX(SEQUENCE_NO), MAX(ACTIVITY_DATE) 
          INTO MAX_SEQUENCE_NO, MAX_ACTIVITY_DATE
          FROM table1;
    EXCEPTION WHEN NO_DATA_FOUND THEN
        MAX_SEQUENCE_NO := 0;
        MAX_ACTIVITY_DATE = TO_DATE('01/01/1980', 'MM/DD/YYYY');
    END;

    -- Bring over recent entries from db_A.table1 to db_B.table1

    INSERT INTO table1
    SELECT *
      FROM table1@db_A
     WHERE
           -- if using timestamps as your criteria:
           activity_date > MAX_ACTIVITY_DATE
           -- if using sequence nos as your criteria:
           sequence_no > MAX_SEQUENCE_NO
    ;

    -- consider adding limiters to decrease the bandwidth necessary
    -- for large transactions. For example activity_date < MAX_ACTIVITY_DATE + 30
    -- would load a month's worth of transactions at a time. sequence_no <
    -- MAX_SEQUENCE_NO + 500 would load 500 transactions at a time.

    COMMIT;
EXCEPTION WHEN OTHERS THEN
    ROLLBACK;
    -- Consider logging the error!
    RAISE;
END;
(lather, rinse, repeat.)

Again, this only works if you have either a sequential unique ID OR an activity date that is always updated on db_A (and that date should be of sufficient resolution to detect one transaction inserted a millisecond after the previous one, so timestamps are best.)

The way I synchronize data between Oracle instances (and non-Oracle instances, e.g., Oracle to mySql) is to make sure I have a sync_date column on all my sync'able tables. When a request is made to sync data, that sync_date column is filled in with the date of the sync. Therefore the actual sync process is simple:

FOR r in ( SELECT * FROM table1
            WHERE sync_date IS NULL ) LOOP
    send_sync_data_somewhere;
    UPDATE table1
       SET sync_date = current_timestamp
     WHERE rowid=r.rowid;
END LOOP;

Usually a limiter goes into effect, but you get the idea. Furthermore, if data changes on a record, the sync_date column is NULLed, at which point the sync process will pick it back up again.

Note: no matter the situation, you will need some sort of de-duplication handling if you are able to support data changes once a row has been sync'd. You could try a MERGE, or an INSERT with a WHERE NOT EXISTS on the SELECT clause coupled with an UPDATE ... WHERE EXISTS.

Hopefully that helps.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange