Question

My problem is as follows:

I am given a number of chi-squared values for the same collection of data sets, fitted with different models. (so, for example, for 5 collections of points, fitted with either a single binomial distribution, or both binomial and normal distributions, I would have 10 chi-squared values).

I would like to use machine learning categorization to categorize the data sets into "models":

e.g. data sets (1,2,5 and 7) are best fitted using only binomial distributions, whereas sets (3,4,6,8,9,10) - using normal distribution as well.

Notably, the number of degrees of freedom is likely to be different for both chi-squared distributions and is always known, as is the number of models.

My (probably) naive guess for a solution would be as follows:

  1. Randomly distribute the points (10 chi-squared values in this case) into the number of categories (2).

  2. Fit each of the categories using the particular chi-squared distributions (in this case with different numbers of degrees of freedom)

  3. Move outlying points from one distribution to the next.

  4. Repeat steps 2 and 3 until happy with result.

However I don't know how I would select the outlying points, or, for that matter, if there already is an algorithm that does it.

I am extremely new to machine learning and fairly new to statistics, so any relevant keywords would be appreciated too.

Était-ce utile?

La solution

The principled way to do this is to assign probabilities to different model types and to different parameters within a model type. Look for "Bayesian model estimation".

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top