I am working on a project where I am importing data from 3rd party sources. Often certain data is missing (usually older data), which is detectable from the data that I do have. Obviously I cannot reproduce all of the missing data, but I can actually reproduce some of it; specifically delta values.

So my plan is to create an entity to represent this missing data, storing what few values I can derive, and keeping on record until it can later be reconciled through another import. This missing entity would represent at least one, or possibly even more than one, of the actual entities.

My question is: is there a common term that is already established to represent this kind of known missing data?

有帮助吗?

解决方案

Placeholder seems appropriate, although strictly speaking a placeholder holds a place for data that are to come later.

Estimated may be a good term to use if the values you are substituting might be slightly off.

Inferred, Calculated, or Derived would also properly communicate the meaning. A calculated value would be something derived numerically; derived is more general in that there may be a deterministic function but the inputs might be non-numeric, e.g. by combining strings. Inferred is more general still and may require a function input that includes external data, e.g. if you infer a position by looking on a map.

Dummy would be a good term if you expect your inferred values to be significantly off.

And of course mocked is the usual term if the data are provided for unit testing.

其他提示

It is not quite clear to me what you mean by entity. The value to use instead of the missing data or a stereotype?

As a stereotype you could use the term placeholder or dummy. For actual data it could be "unknown" (you would want it to be unique, impossible to clash with whatever you will ultimately receive).

The missing information can be called "unspecified"

If the missing information is known to exist but isn't communicated, it can be called "unreported".

The data structure that has some info but is missing others can be called "incomplete"

My opinion is that rather than having a missing entity represented as an alternate type, we would be better to describe the provenance of the data we have in all entities in some way — such as tagging with attributes or relations, something simple such as: known/given, vs. derived/computed/inferred vs. assumed, or something more complex capturing who/what/when.

In a relational model, different types of entities will mean separate tables, which imposes burdens on queries. In OOP, different types will impose similar burdens unless inheritance is used to unify the concepts — and to that I would say composition over inheritance: in this case composition of provenance information over inheritance of (provenance) types.

is there a common term that is already established to represent this kind of known missing data?

Not that I'm aware of the way you are describing it, but there are notions of provenance of information, and these notions can range from simple to complex.

Otherwise, in relational model, NULL is used to represent two sadly conflated notions: (1) missing and not applicable, and (2) missing yet applicable (or simply missing data). Your description goes to the latter use of NULL in SQL.

(The former, missing and not applicable, means there are really different types in the same table, such as when a CEO does not report to any one individual (and never will: the data is not missing or unknown, the column "not applicable" to this row) so has their reports to column as null, unlike all the rest of the employees who do and must report to someone.)


FYI, there are other concepts such as futures or promises, that are effectively proxies for not yet available information, though these are deeply related to programming models (threads, async activity, other behaviors), and less to information storage of domain objects.

许可以下: CC-BY-SA归因
scroll top