The algorithm that you describe is a feature selection algorithm, similar to the forward selection technique. At each step, we find a new feature Fi that minimizes this criterion :
weight_1 * ErrorProbability(Fi) + weight_2 * Acc(Fi)
ACC(Fi) represents the mean correlation between the feature Fi and other features already selected. You want to minimize this in order to have all your features not correlated, thus have a well conditionned problem.
ErrorProbability(Fi) represents if the feature correctly describes the variable you want to predict. For example, lets say you want to predict if tommorow will be rainy depending on temperature (continuous feature)
The Bayes error rate is (http://en.wikipedia.org/wiki/Bayes_error_rate) :
P = Sum_Ci { Integral_xeHi { P(x|Ci)*P(Ci) } }
In our example
Ci belong to {rainy ; not rainy}
x are instances of temperatures
Hi represent all temperatures that would lead to a Ci prediction.
What is interesting is that you can take any predictor you like.
Now, suppose you have all temperatures in one vector, all states rainy/not rainy in another vector :
In order to have P(x|Rainy), consider the following values :
temperaturesWhenRainy <- temperatures[which(state=='rainy')]
What you should do next is to plot an histogram of these values. Then you should try to fit a distribution on it. You will havea parametric formula of P(x|Rainy).
If your distribution is gaussian, you can do it simply :
m <- mean(temperaturesWhenRainy)
s <- sd(temperaturesWhenRainy)
Given some x value, you have the density of probability of P(x|Rainy) :
p <- dnorm(x, mean = m, sd = s)
You can do the same procedure for P(x|Not Rainy). Then P(Rainy) and P(Not Rainy) are easy to compute.
Once you have all that stuff you can use the Bayes error rate formula, which yields your ErrorProbability for a continuous feature.
Cheers