Algorithm for impelemnting a survey-like program

https://stackoverflow.com/questions/16081280

04-04-2022
|

Domanda

I wish to create a program in Java which will ask the user a number of questions and report some results. It is pretty much like a survey. In order to explain the problem better consider the following example: Let’s say that there are currently 4 questions available eg Qa, Qb, Qc and Qd. Each question has a number of possible options:

=> Question A has 4 possible options a1, a2, a3 and a4.

=> Question B has 3 possible options b1, b2 and b3

=> Question C has 5 possible options c1, c2, c3, c4 and c5

=> Question D has 2 possible options d1 and d2

Moreover there are some results available which will be reported based on the user’s answers in the above questions. Let’s assume that there are 5 such results called R1, R2, R3, R4 and R5. Each result has a number of characteristics. These characteristics are really answers to the above questions. More precisely:

=> The characteristics of R1 is the set of {Qa = a4, Qb = b2, Qc = c2, Qd = d1} This says that R1 is related with Qa via the a1 option, with Qb via the b2 option and so on

=> R2: {Qa = a3, Qb = b3, Qc = c3, Qd = d2}

=> R3: {Qa = a4, Qb = b1, Qc = c1, Qd = d2}

=> R4: {Qa = a2, Qb = b2, Qc = c5, Qd = d1}

=> R5: {Qa = a1, Qb = b3, Qc = c4, Qd = d2}

Let’s say that a user U provides the following answers to the questions {Qa = a4, Qb = b1, Qc = c1, Qd = d1}

The purpose of the program is to report the result which is closer to the user answers along with a percentage of how close it is. For instance since there is no any result which matches 100% the user answers the program should report the results which match as more answers as possible (above a certain threshold eg 50%). In that specific case the program should report the follow results:

=> R3 with 75% (since there are 3 matches on the 4 questions)

=> R1 with 50% (since there are 2 matches on the 4 questions)

Notice that R4 has one match (so 25%) whereas R2 and R5 have no matches at all (so 0%). The main issue on implementing the above program is that there are a lot of questions (approximately 30) with a number of answers each (3-4 answers each). I am not aware of an efficient algorithm which can retrieve the results which are closer to the user answers. Notice that the way that these results are stored is not important at all. You can assume that the results are stored in a relational database and that SQL query is used to retrieve them.

The only solution I can think of is to perform an exhaustive search but this not efficient at all. In other words I am thinking to do the following:

=> First try to retrieve results which match exactly the user answers: {Qa = a4, Qb = b1, Qc = c1, Qd = d1}

=> If no results exist then change the option of a question (eg Qa) and try again. For example try: {Qa = a1, Qb = b1, Qc = c1, Qd = d1}

=> If there is still nothing then try the rest possible options for Qa eg a2, a3

=> If there is still nothing then give Qa the initial user answer (that is a4) and move to Qb to do something similar. For example try something like: {Qa = a4, Qb = b2, Qc = c1, Qd = d1}

=> If after trying all the possible options for all questions one by one there are any results then try changing the options of COMBINATIONS of questions. For example try change the options of two questions at the same time (eg Qa and Qb): {Qa = a1, Qb = b2, Qc = c1, Qd = d1}

=> Then try combinations of three questions and so on...

Clearly the above algorithm would be extremely slow on a large number of questions. Is there any known algorithm or heuristic which is more efficient than the above algorithm?

Thanks in advance

Soluzione

"Only" 30 Questions?
Then the following "stupid" algorithm will probably be faster than any highly "intelligent" and complicated algorithm.

iterate over characteristics
    score = 0
    iterate over questions
        if questions's answer is right in current characteristic
            score++

Then add a variable which keeps track of the maximum value and matching characteristic and you are set.

Runtime is size of characteristics * size of questions, whereas the algorithm you are describing can have exponential runtime, and on top of that is much more complicated both for programming and for executing (due to effects as branch misprediction)

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow