Question

Pardon me for the woolly question - I'm not really that familiar with OLAP & cubes. Let me explain my situation...

I'd like to build a database to store questionnaire results, where there might be a few dozen questions per questionnaire. Having gathered a few thousand completed questionnaires, I'd like to analyze the results, and that sounds like a good candidate for OLAP type stuff (of which I know very little). I need to be able to run queries on "all male respondents age 20-30 who own a dog" - i.e. combining the answers to "how old are you", "do you own a dog", etc.

I also want to be able to store the results of next month's survey, and the month after that, etc., and run queries showing this month versus last, etc. So far, so good, I assume.

Here's the nub of my question: whereas this month my questionnaire might have questions about sex, age & dog ownership, next month's questionnaire might include a question about (say) eye color. It might (or might not) also drop some questions. Is that do-able in the OLAP world, or do you need to know all the "dimensions" (if I'm using the right term) in advance when you design your cube?

Also, if I'm running several different surveys with different-but-overlapping questions, can I store them all in the same cube and run queries across surveys? Each survey might have a few dozen questions, with a couple of dozen overlapping with other surveys. Do OLAP systems cater for this sort of thing? I just don't know how rigid they are, and whether they are in fact appropriate for this kind of usage.

Any help greatly appreciated.

PS. Before someone suggests it, I did just buy Kimball's Data Warehouse Toolkit but haven't had a chance to read it yet. (I suspect it may not directly answer this question anyway).

Was it helpful?

Solution

There is a white paper here which has a section covering modelling survey data. This maybe the sort of thing that you are looking for.

OTHER TIPS

I'll start by saying that I'm an OLAP newbie too but I think I have a handle on what you are looking to achieve.

In effect your questions are one of your dimensions, the answer to that question being part of the fact table, i.e. the fact table has the answer and has dimensions associated with it for age, sex, locality (perhaps), questions. It may feel a bit back to front but that's something that I'm coming to terms with for OLAP.

You might also want another dimension related to question that groups them into questionnaires but that might just be a value in the question dimension itself, i.e. Question { QuestionnaireID = 1, QuestionNumber = 4, QuestionText = "Do you own a dog?" }.

Not sure if that helps but hopefully will give you some ideas if nothing else.

Another OLAP newbie here as well...

1) I only have experience creating OLAP cubes with Mondrian (Pentaho), which does allow you to revise the cube's schema, which is just an XML file, and rebuild them (or in Pentaho-speak, publish). So for that platform, anyway, there's no such requirements for knowing all your dimensions ahead of time

2) I agree with Lazurus' recommendation about creating a dimension of questions. It's not a requirement that each of your "facts" has a value that's present in all dimensions, so if you were to look across the dimension for "Question n", then I believe it should only give you data for the questionnaires where "Question n" is a relevant dimension.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top