Epanechnikov multivariate density

Question

Multiplicative method - calculate the kernel for each dimension and then multiply them.

Calculate the norm of the vector and calculate the kernel for that value.

assumes that your x variable and y are statistically independent, which does not hold for 2. On the other hand, 2. is a radially symmetric kernel.

How exactly would each of the two methods work with my data?

I would try both and see which one gives a better result (e.g. which one gives a better likelihood on the data but taking care not to overfit the data e.g. by using cross validation).

In its most basic form this means that you split your sample, use one part to calculate the density estimation function (i.e. place kernels around data points) and evaluate the likelihood on the other part (product of the values of the density estimation function at the points used for testing or better the log of the product of probabilities) and see which one gives the higher probability product on the 'other' sample (the one NOT used for calculating the estimate).

The same argument (cross validation) also applies to the choice of the width of the kernel ('scaling factor', make the kernel narrow or broad).

You can of course just select a kernel width by hand to start with. Choosing the kernel width too small will give a 'spiky' density estimate, choosing it too large will 'wash out' the important features of your data.

What do I need to normalize knowing that the Epanechnikov kernel yields 0 for normalized values > 1 or < -1.

The feature you mention is not related to the normalization. You should use a normalized expression for the kernel itself, i.e. the integral over the range where the kernel is non-zero should be one. For your case 1., if the 1D kernels are normalized (which is the case for example for 3/4*(1-u^2) on [-1..1], also the 2D product will be normalized. For the case 2. one has to calculate the 2D integral.

Assuming the kernel is normalized, you then can normalize the density estimate as follows:

normalization formula

where N is the number of data points. This will be normalized, i.e. the integral of p(x,y) over the 2D plane is one.

Note that neither of the functional forms you mentioned allow arbitrary covariance matrices. One way to work around this is to first 'decorrelate' the dataset (i.e. apply a matrix transformation such that the covariance matrix of the dataset becomes the unit matrix), then perform the density estimate and then apply the inverse transformation.

Also there are extensions such as adaptive kernel density estimation where the width of the kernel varies itself as function of x and y if at some point you want to refine your estimate etc.