سؤال

I need to replicate in CouchDB data from one database to another but in the process I want to alter the documents being replicated over,

  1. mostly stripping out particular fields (but other applications mentioned in comments).
  2. The replication would always be 100% one way (but other applications mentioned in comments could use bi-directional and sync)
  3. I would prefer if this process did not increment their revision ID but that might be asking for too much.

But I don't see any of the design document functions that do what I am trying to do.

As it seems doesn't do this, what plans are there for adding this? And meanwhile, what workarounds are there?

هل كانت مفيدة؟

المحلول

No, there is no out-of-the-box solution, as this would defy the whole purpose and logic of multi-master, MVCC logic.

The only option I can see here is to create your own solution, but I would not call this a replication, but rather ETL (Extract, Transform, Load). And for ETL there are tools available that will let you do the trick, like (mixing open source and commercial here):

There is plenty more of ETL tools on the market.

نصائح أخرى

I believe the best approach here would be to break out the fields you want to filter out into a separate document and then filter out the document during replication.

Of course the best way would be to have built-support for this, but a workaround which occurs to me would be, instead of here using the built-in replication, to code and use a custom replication which will do the additional needed alterations/transformations, still using rather than going beneith, the other built-ins, and with good coding, in many situations (especially if each master can push to its slaves), it feels this could be nearly as efficient.

  1. This requires efficient triggers be put on each source/master to detect any changes, which I believe CouchDB does offer (or at least PouchDB appears to), which would then copy the changes to another location also doing the full alterations.
  2. If the source of the change is unable to push the change to the final destination, this fixed store may to be local to it where the destination can pull from -- which could get pretty expensive especially in multi-master, as each location has to not only store & maintain its own data but also the data (being sent) of everyone it sends to.
  3. This replicate would also place each source document's revision ID in the the document's copy...
    1. ...that is ideally, including essential if the copy was to be {updated, aka a master}, too.
    2. ...in form of either:
      1. ideally the normal "_rev" property. Indeed this looks quite possible per it ("preserve their revisions ID") already done by the normal replication algorithm using the builtin "Bulk Docs API" which seemingly our varient would use, too
      2. otherwise have a new copy object (with its own _rev) plus another field as "_rev_original" ntelling the original rev. But well that would work?
      3. Clearly such copy could be created no problem.
      4. Probably no big if the destination is just reading the data.
      5. Seems hairy if the destination is also writing the data. As we'd now have to merge with these non-standard revisions. But doable.
    3. Relevant to this (coding an a custom/improved replication (to do this apparently-missing functionality) ideally without altering Pouch and especially Couch source code), as starter/basis material (the standard method), here's the normal Couch replication algorithm which unfortunately doens't clearly say it only uses builtin ops but it looks like it, and also the official overview of what it does; I'm suspecting Pouch implements this, likely in Pouch's replicate.js (latest release as of 2014.07).

Futher implementation particulars? - those who would know, please put it here.

This is a "community wiki" answer so please extend it.

Also please comment links & details of anyone/system already doing or trying to do this or similar.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top