Question

I am being required to use a postgreSQL database and it will replace my current use of berkeleyDB. Although; I realize this is not an ideal situation, it is beyond my control.

So the question is... If you were required to make postgreSQL into a key value store how would you go about doing this, while making it as efficient as possible?

My values are byte arrays and my key's are strings, I could impose some restrictions on the lengths of these strings.

I assume I should use a blob for my value and primary key column holding the key, but as I am just venturing into this journey I am curious if anyone in the stack overflow community has done this, or if there are any specific 'gotchas' I should look out for.

Was it helpful?

Solution

The extension in Postgresql to properly do this is called hstore. It works in a similar fashion as you would expect other key-value store systems. Just load the extension. The syntax is unique but if you have ever used redis or mongo you will get it quickly. Don't make it harder than it is. I understand, we often don't get to pick our tools and have to make do.
Here is the document page:

http://www.postgresql.org/docs/9.1/static/hstore.html

OTHER TIPS

If you are forced to use relational database, I would suggest to try to find structure in your data to take advantage of the fact, since you forgo the advantage of speed you got with unstructured data and key-value store. The more structure you find, the better advantage you get out of your predicament. Even if you only find structure in the keys.

Also consider if you will only need sequential or random access to your data and in which ratio and structure your database by this requirement. Are you going to do queries on your values by type for example? Each of those questions could have effect on how you structure your database.

One specific consideration about blobs in postgresql they are internally represented as pg_largetable (loid:oid,pageno:int4,data:bytea). The size of the chunks is defined by LOBBLKSIZE, but typically 2k. So if you can use byte arrays in your table instead of blobs and limit size of your value/key pair under blocksize, you can avoid this indirection through second table. You could also increase the block size if you have access to configuration of the database.

I'd suggest to go looking for structure in data and patterns in data access and then ask your question again with more detail.

It really should be dependant on what the key will be. If it will always be a string under 255 characters, then use a Varchar as yoru PK and then use a blob (assuming a large value) for the value. if it will always be a number, use int, etc.

In other words, need more info to really give you a good answer :)

What do you need to store as a value ? Strings ? Ints ? Objects (e.g. serialized Java objects). A simple implementation would work with a 3 column table looking like:

NAME(VARCHAR)   TYPE(VARCHAR)   VALUE(VARCHAR)

(perhaps the TYPE is some enumeration). The above wouldn't work for binary data like serialised objects, though and perhaps you need a BLOB there.

Alternatively (and probably a much better idea), have you seen Apache Commons Configuration ? You can back that with a database (via JDBC) and you can store properties such that you retrieve them thus:

// get a property called 'number'
Double double = config.getDouble("number");
Integer integer = config.getInteger("number");

That may save you a lot of grief in terms of implementation. You may have a problem with saving binary data, in that you'd have to serialise it prior to insertion and post-retrieval. But I've used this in the past for storing ints,doubles and serialised Java objects via XStream, so I can confirm it works well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top