Question

I am designing my first MongoDB (and first NoSQL) database and would like to store information about files in a collection. As part of each file document, I would like to store a log of file accesses (both reads and writes).

I was considering creating an array of log messages as part of the document:

{
    "filename": "some_file_name",
    "logs" : [
        { "timestamp": "2012-08-27 11:40:45", "user": "joe", "access": "read" },
        { "timestamp": "2012-08-27 11:41:01", "user": "mary", "access": "write" },
        { "timestamp": "2012-08-27 11:43:23", "user": "joe", "access": "read" }
    ]
}

Each log message will contain a timestamp, the type of access, and the username of the person accessing the file. I figured that this would allow very quick access to the logs for a particular file, probably the most common operation that will be performed with the logs.

I know that MongoDB has a 16Mbyte document size limit. I imagine that files that are accessed very frequently could push against this limit.

Is there a better way to design the NoSQL schema for this type of logging?

Was it helpful?

Solution

Lets first try to calculate avg size of the one log record:

timestamp word = 18, timestamp value = 8, user word = 8, user value=20 (10 chars it is max(or avg for sure) I guess), access word = 12, access value 10. So total is 76 bytes. So you can have ~220000 of log records.

And half of physical space will be used by field names. In case if you will name timestamp = t, user = u, access=a -- you will be able to store ~440000 of log items.

So, i think it is enough for the most systems. In my projects I always trying to embed rather than create separate collection, because it a way to achieve good performance with mongodb.

In the future you can move your logs records into separate collection. Also for performance you can have like a 30 last log records (simple denormalize them) in file document, for fast retrieving in addition to logs collection.

Also if you will go with one collection, make sure that you not loading logs when you no need them (you can include/exclude fields in mongodb). Also use $slice to do paging.

And one last thing: Enjoy mongo!

OTHER TIPS

If you think document limit will become an issue there are few alternatives.

The obvious one is to simple create a new document for each log.

So you will have a collecton "logs". With this schema.

{
    "filename": "some_file_name",
    "timestamp": "2012-08-27 11:40:45", 
    "user": "joe", 
    "access": "read"
}

A query to find which files "joe" read will be something like the

db.logs.find({user: "joe", access: "read"})
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top