Changing “schema” in RavenDB

https://stackoverflow.com/questions/4776873

23-10-2019
|

Pergunta

Just for the interest of expanding my knowledge I have started looking at various NoSQL options. The first one I visited is RavenDB and it looks interesting. I am still trying to break my deep-seated relational thinking, along with the typical RDBMS maintenance routines.

In my day-to-day dealing with Entity Framework we go through the routine of scripting DB changes, refreshing the EF mapping model, etc. How does it work in NoSQL stuff, especially RavenDB? Once an app has gone life how does one make changes to the various POCO objects, etc. and deploy it to production? What happens to data stored in the old POCO classes?

I haven't delved deep or used Raven DB in anger yet. This may be obvious once I do but would love to know before hand so I don't code myself into a corner.

Thanks, D.

Solução

They stay as they are - properties not existing anymore will be ignored when loading (and lost on change), and missing properties will come back as null,

Recommend you use set based operations to keep data in check with object model.

Oh, look at me, I'm on a computer now!

Right so basically, in moving to a document store you are right in recognising that you lose some functionality and gain some freedom in that in a database you have an up-front schema defined and trying to upload data that doesn't match that schema will result in an error.

It is important to recognise however, that there is a difference between schema-less and structure-less, in that your documents all contain their own structure (key/value pairs denoting property name and property value).

This makes it useful for the whole "just getting on" factor of writing some code and having your data persisted - but when being so easy to go around changing your code structure it can be harder to reconcile that with your already persisted data.

A few strategies present themselves at this point:

Make your structure immutable once you have persisted data, version your classes
Allow modification of structure, but use set-based operations to update data to match new structure
Allow modification of structure, and write code to deal with inconsistencies when loading data

The third one is clearly a bad idea as it will lead to unmaintainable code, versioning your classes can work if you're just storing events or other such data but isn't really appropriate for most scenarios, so you're left with the middle option.

I'd recommend doing just that, and following a few simple rules along the same lines as you'd follow when dealing with an up-front schema in a relational database.

Use your VCS system to determine changes between deployed versions
Write migration scripts that upgrade from one version to another
Be careful of renames/removing properties - as loading a document and saving the document will result in lost data if those properties don't exist on the new document

Etc.

I hope this is more helpful :-)

Outras dicas

RavenDB serializes your .NET objects to JSON format. There is no schema.

If you add some objects to your database, they will get serialized. If you add some properties to the type you are serializing, the objects you have already stored will be missing those properties.

This article by Ayende describes how to perform a migration from 1 to version 2 (in this case changing a "Name" property to "FirstName" and "LastName" properties.

http://ayende.com/blog/66563/ravendb-migrations-rolling-updates

Basically a listener is registered in the DocumentStore:

documentStore.RegisterListener(new CustomerVersion1ToVersion2Converter())

Sample impementation taken from the article mentioned above:

public class CustomerVersion1ToVersion2Converter : IDocumentConversionListener
{
    public void EntityToDocument(object entity, RavenJObject document, RavenJObject metadata)
    {
        Customer c = entity as Customer;
        if (c == null)
            return;

        metadata["Customer-Schema-Version"] = 2;
        // preserve the old Name property, for now.
        document["Name"] = c.FirstName + " " + c.LastName;
        document["Email"] = c.CustomerEmail;
    }

    public void DocumentToEntity(object entity, RavenJObject document, RavenJObject metadata)
    {
        Customer c = entity as Customer;
        if (c == null)
            return;
        if (metadata.Value<int>("Customer-Schema-Version") >= 2)
            return;

        c.FirstName = document.Value<string>("Name").Split().First();
        c.LastName = document.Value<string>("Name").Split().Last();
        c.CustomerEmail = document.Value<string>("Email");
    }
}

You don't so much have no schema management as move it into your code so there is never a mismatch between the objects in your code and those in your database.

The first part of handling changes is to make sure that you use a serializer that can handle missing/extra values - if a field isn't defined in the data, set it to null. If a field in the data doesn't match a property on your object, ignore it.

Most changes can be handled without any more than that - either there is a new field and you need to have a default value for existing records anyway, or there is an old field you don't care about any more.

For more complex changes such as renaming/combining fields or changing data format, add a new field to your object without removing the old ones and have your load method transfer data from the old fields. When you save the record it will be in the new format. This code can either be left in place permanently, updating data as needed, or you can set up a one time process to call the same code for all existing objects. Note that unlike a sql script there is no downtime required for this type of update even if it takes a long time to run on a large dataset, because the code can handle both old and new formats.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow