Question

Going straight to the point, would it be possible to still keep a normalized, bi-dimensional model within the Google App Engine Datastore, where each relation is a kind in itself, and its entities are instances of the relation?

I already know that the Datastore (with its underlying Bigtable technology) works differently than RDBM systems, but my question is: what prevents one from still laying out their model in a relational way (with all its advantages from a theoretical and planning point of view) within the Datastore?

An example to clarify. Couldn't I still plan entities of the following kinds:

  • Person (Name:str, Company:Company)
  • Company (Name:str)
  • Project (Notes:text)
  • PersonProjects (Person:Person, Project:Project)

The properties which refer to other entities (e.g. Person.Company, PersonProjects.Project) would store those entities' ids. What would the major drawbacks (if any) be, performance-wise? Note that I could have normalized the model further, e.g. introducing new kinds for PersonName, CompanyName etc, but I decided here to keep one-value properties within the same kind they refer to.

I remember watching some time ago a video from the I/O series (made by the same Google) in which normalization techniques were employed to prevent entities of a certain kind from being too large, i.e. having too many properties (the problem actually involved exploding indexes). One property of the planned kind was "detached" from it as a new kind, only to be augmented to it afterwards through code.

Well, couldn't I still do that for all of a kind's properties? I can't see any major issues except for the increase in client-side (or server-side) work (that required to get an object "set up" for retrieval). So, is the switch to an "entity-based" model really necessary? Can't we simulate relations through kinds and entities?

I hope I've been clear enough.

Was it helpful?

Solution

Nothing prevents you from normalising your model in Datastore. The problem is that Datastore has a very limited query language: inequality filter only on one property, no multi-kind query, no JOINs, etc.. This forces you to organise data depending on your access pattern: access-oriented modelling. This often forces you to store data in illogical places, just to get to it fast (= minimum set of DB operations).

Additionally, transactions are quite limited, forcing you to organise data in certain way (entity groups). Or if you use XG transactions then you will be limited to 25 entities per transaction.

Also note that there is no DB-enforced referential integrity as is usual in SQL RDBMs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top