R: Anova Data - 2^5 Design, with replicates

Question 1

Your matrix should have 6 columns and 2^5 * 500 = 16000 rows. The first column is your response variable, which I'm assuming is a continuous numerical variable. The next five columns each represent a treatment.

Say you are measuring plant height response to fertilizer and light. Fertilizer is either added or is not added, and light is either high or low. Your response if plant height. The first treatment column is "Fertilizer", and contains 1's and 0's. The second treatment column is "Light", and contains 1's and 0's.

Note that by "treatment" I do not mean "High Light" and "Low Light" --- the column is just for "Light".

To do the anova, you would do something like:

aov(Height~Fertilizer*Light, data=Data)

Basically, the first row in Data would be a column vector of the "32 vectors of 500 response values" (i.e., rbind() or c() those 32 vectors), and the next 5 columns should be your "design matrix/ vectors saving all of the +/- values used as the 5 different factors".

I hope that this description helps to organize your thoughts.

Question 2

You almost certainly want a data frame with 32x500 = 16000 rows, and 6 columns (5 for the covariates, and 1 for the response).

The data frame would be laid out like this:

x1 x2 x3 x4 x5 y
 0  0  0  0  0 *
 0  0  0  0  0 *
 ...

 1  0  0  0  0 *
 1  0  0  0  0 *
 ...

 0  1  0  0  0 *
 0  1  0  0  0 *
 ...

where each covariate pattern is replicated 500 times. You can generate this with

df <- expand.grid(1:500, x1=0:1, x2=0:1, x3=0:1, x4=0:1, x5=0:1)[, -1]

and then tack your response on the side.