This is a design/planning question, not specific to any particular software/deployment. The stuff I'm talking about doesn't exist yet, but I am hoping to avoid mistakes early in the process.

Here's the situation that I'd like advice on:

  1. We have 7 business "verticals" (e.g. Sales, marketing, etc) with each vertical separately maintaining its data and a RESTful API that they use to allow access to their data.
  2. The raw data is separately maintained by the respective vertical. This gives each vertical more freedom to define their architecture, data pipeline, and processing.
  3. The API for each vertical is used to define a higher-level "contract" that is supposed to remain invariant (syntactically and semantically) regardless of changes to the underlying data architecture.

Here's the problem:

We like the idea above because it decouples each business unit. However, the decoupling is also a problem --- since we are part of a single business, we realized that we share a common data model for a large subset of our data.

For example: potential project sites are tracked by the marketing team then pursued by commercial team as an opportunity, then lost/won by sales, then designed by engineering, and maintained by services.

Each vertical tracks different things about these entities and they may not have one-to-one cardinality (e.g. Multiple opportunities per potential project).

Another example is enforcing naming constraints: we have a common set of names for competitors, models, countries, etc. and we want to enforce this across our datasets so, for example, "Acme X-35" is the only way to describe this make and model across all our datasets.

my ideas

Addressing connections between data models (meta model)

Either we enforce cross-database data model "by convention" (seems brittle) or we create a "meta database" that pulls from each API into a relational database composed of "views"...we just create and relate a bunch of materialized views (no raw tables). This database would contain normalized fields and tables that implement the common model.

Enforcing naming constraints

The key assumption is that names are immutable after we decide upon them. Assuming this holds, then we can provide a simple "validation server" that serves the allowable names for a field (a RESTful lookup table, essentially) and each database can incorporate this into their validation workflow.

Ok so, as you can see, I've tried to give this some thought but I'm not sure if there is a more standard way to coordinate and synchronize cross-database data model with constraints.

有帮助吗?

解决方案

In general it's a bad idea to try to enforce this accross databases. You already said that each business unit has it's own architecture. So as long as every one knows who owns what piece of data then they should be 'forced' to follow that system.

Which department decides what products you have. That department will have an api that contains a list of current products. If that department wants to add more fields it should be little problem to do so.

Also remember that an engineering system doesn't track opportunities. Especially not opportunity's that are lost. Engineering only cares about bids you actually won. So the insight here is that an opportunity is not the same thing as an signed contract. (Even if they are both called Acme X35)

Of course each system can publish it's own api (possibly formalized with for example a wsdl). That way each department can expand the way that's relevant to them.

And of course nothing beats just talking to eachother on a regular basis about how you want to change things.

其他提示

RDF is a better match to this than SQL, because it allows links to another database by design. There are some performance hits with triple stores, especially large ones, but if you can afford this it is an option.

An RDF schema is expressed in RDF and can naturally import definitions from another schema. So you can have the benefits of a data warehouse without a separate store.

许可以下: CC-BY-SA归因
scroll top