Efficient algorithm for determining values not as frequent in a list

Question 1

This is what I came up with:

from collections import Counter
import random

# the number of question ids I need returned to
# assign to the exam
needed = 3

# the "pool" of possible question ids the user has access to
possible = [1,2,3,4,5]

# examples of lists of question ids I might see that represent
# questions a user has already answered
answered1 = []
answered2 = [1,3]
answered3 = [5,4,3,2]
answered4 = [5,4,3,2,1,1,2]
answered5 = [5,4,3,2,1,1,2,3,4,5,1]
answered6 = [5,4,3,2,1]

def getdiff(answered):
    diff = set(possible) - set(answered)
    still_needed = needed - len(diff)
    if still_needed > 0:
        not_already_selected = list(set(possible) - diff)
        random.shuffle(not_already_selected)
        diff = list(diff) + not_already_selected[0:still_needed]
        random.shuffle(diff)
        return diff
    diff = list(diff)
    random.shuffle(diff)
    if still_needed == 0:
        return diff
    return diff[0:needed]

def workit(answered):
    """ based on frequency, reduce the list down to only
        those questions we want to consider "answered"
    """
    have_count = 0
    if len(possible) > len(answered):
        return getdiff(answered)
    counted = Counter(answered)
    max_count = max(counted.values())
    # the key here is to think of "answered" questions as
    # only those that have been seen with max frequency
    new_answered = []
    for value, count in counted.iteritems():
        if count == max_count:
            new_answered.append(value)
    return getdiff(new_answered)

print 1, workit(answered1)
print 2, workit(answered2)
print 3, workit(answered3)
print 4, workit(answered4)
print 5, workit(answered5)
print 6, workit(answered6)

"""
>>> 
1 [2, 4, 3]
2 [2, 5, 4]
3 [5, 2, 1]
4 [5, 3, 4]
5 [2, 4, 3]
6 [2, 3, 5]
>>> ================================ RESTART ================================
>>> 
1 [3, 1, 4]
2 [5, 2, 4]
3 [2, 4, 1]
4 [5, 4, 3]
5 [4, 5, 3]
6 [1, 5, 3]
>>> ================================ RESTART ================================
>>> 
1 [1, 2, 3]
2 [4, 2, 5]
3 [4, 1, 5]
4 [5, 4, 3]
5 [2, 5, 4]
6 [2, 1, 4]
"""

Question 2

Rewritten with the database stuff in fill-in pseudocode.

If I understand the problem correctly, I'd treat the questions (or their IDs as proxies) like a physical deck of cards: for each user, shuffle the deck and deal them a question at a time; if they want more than len(deck) questions, just start over: shuffle the deck into a new order and do it again. When a question is seen for the nth time, all other questions will have been seen n or n-1 times.

To keep track of what questions the user has had available, we put the unused question IDs back in the database and increment the "how many times through" counter whenever we need a new deal.

Something like:

from random import shuffle

def deal():
    question_IDs = get_all_questions(dbconn) # all questions
    shuffle(question_IDs)
    increment_deal_count(dbconn, userID) # how often this student has gotten questions
    return question_IDs


count_deals = get_stored_deals(dbconn, userID) # specific to this user
if count_deals: 
    question_IDs = get_stored_questions(dbconn, userID) # questions stored for this user 
else: # If 0 or missing, this is the first time for this student
    question_IDs = deal()


while need_another_question(): #based on exam requirements
    try:
        id = question_IDs.pop()
    except IndexError:
        question_IDs = deal()
        id = question_IDs.pop() # Trouble if db is ever empty. 

    use_question(id) # query db with the ID, then put question in print, CMS, whatever

# When we leave that while loop, we have used at least some of the questions
# question_IDs lists the *unused* ones for this deal
# and we know how many times we've dealt.

store_in_db(dbconn, userinfo, question_IDs)
# If you want to know how many times a question has been available, it's
# count_deals - (ID in question_IDs)
# because True evaluates to 1 if you try to subtract it from an integer.

Question 3

Why not have two lists, one for questions not yet picked and one for questions that have been picked. Initially, the not-yet-picked list will be full, and you will pick elements from it which will be removed and added to the picked list. Once the not-yet-picked list is empty, repeat the same process as above, this time using the full picked list as the not-yet-picked list and vice versa.

Question 4

To implement your algorithm, you only need to shuffle the list and pass through it, and when done, repeat.

No need to copy the list or juggle items between two lists, just use the following control flow, for example:

import random

def ask_questions(list_of_questions):
    while True:
        random.shuffle(list_of_questions)
        for question in list_of_questions:
            print(question)
            # Python 3 use input not raw_input
            cont = raw_input('Another question?') 
            if not cont:
                break
        if not cont:
            break

Question 5

Let's define a "pivot" that separates a list into two sections. The pivot partitions the array such that all numbers before the pivot has been picked one more than the numbers after the pivot (or more generally, all numbers before pivot are ineligible for picking, while all numbers after the pivot are eligible for picking).

You simply need to pick a random item from the list of numbers after the pivot, swap it with the number on the pivot, and then increment the pivot. When the pivot reaches the end of the list, you can reset it back to the start.

Alternatively, you can also use two lists, which is much easier to implement, but is slightly less efficient as it needs to expand/shrink the list. Most of the time, the ease of implementation would trump the inefficiency so the two-list would usually be my first choice.