Question

I am writing a program to simulate the actual polling data companies like Gallup or Rasmussen publish daily: www.gallup.com and www.rassmussenreports.com

I'm using a brute force method, where the computer generates some random daily polling data and then calculates three day averages to see if the average of the random data matches pollsters numbers. (Most companies poll numbers are three day averages)

Currently, it works well for one iteration, but my goal is to have it produce the most common simulation that matches the average polling data. I could then change the code of anywhere from 1 to 1000 iterations.

And this is my problem. At the end of the test I have an array in a single variable that looks something like this:

[40.1, 39.4, 56.7, 60.0, 20.0 ..... 19.0]

The program currently produces one array for each correct simulation. I can store each array in a single variable, but I then have to have a program that could generate 1 to 1000 variables depending on how many iterations I requested!?

How do I avoid this? I know there is an intelligent way of doing this that doesn't require the program to generate variables to store arrays depending on how many simulations I want.

Code testing for McCain:

 test = [] 

while x < 5: 

   test = round(100*random.random())

   mctest.append(test) 

   x = x +1 


mctestavg = (mctest[0] + mctest[1] + mctest[2])/3 

#mcavg is real data

if mctestavg == mcavg[2]: 
  mcwork = mctest 

How do I repeat without creating multiple mcwork vars?

Was it helpful?

Solution

Would something like this work?

from random import randint    

mcworks = []

for n in xrange(NUM_ITERATIONS):
    mctest = [randint(0, 100) for i in xrange(5)]
    if sum(mctest[:3])/3 == mcavg[2]:
        mcworks.append(mctest) # mcavg is real data

In the end, you are left with a list of valid mctest lists.

What I changed:

  • Used a list comprehension to build the data instead of a for loop
  • Used random.randint to get random integers
  • Used slices and sum to calculate the average of the first three items
  • (To answer your actual question :-) ) Put the results in a list mcworks, instead of creating a new variable for every iteration

OTHER TIPS

Are you talking about doing this?

>>> a = [ ['a', 'b'], ['c', 'd'] ]
>>> a[1]
['c', 'd']
>>> a[1][1]
'd'

Lists in python can contain any type of object -- If I understand the question correctly, will a list of lists do the job? Something like this (assuming you have a function generate_poll_data() which creates your data:

data = []

for in xrange(num_iterations):
    data.append(generate_poll_data())

Then, data[n] will be the list of data from the (n-1)th run.

since you are thinking in variables, you might prefer a dictionary over a list of lists:

data = {}
data['a'] = [generate_poll_data()]
data['b'] = [generate_poll_data()]

etc.

I would strongly consider using NumPy to do this. You get efficient N-dimensional arrays that you can quickly and easily process.

A neat way to do it is to use a list of lists in combination with Pandas. Then you are able to create a 3-day rolling average. This makes it easy to search through the results by just adding the real ones as another column, and using the loc function for finding which ones that match.

rand_vals = [randint(0, 100) for i in range(5))]
df = pd.DataFrame(data=rand_vals, columns=['generated data'])
df['3 day avg'] = df['generated data'].rolling(3).mean()
df['mcavg'] = mcavg # the list of real data
# Extract the resulting list of values
res = df.loc[df['3 day avg'] == df['mcavg']]['3 day avg'].values

This is also neat if you intend to use the same random values for different polls/persons, just add another column with their real values and perform the same search for them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top