Your best bet is Kullback-Liebler Divergence (KL). It allows you to set the value you wish your neurons to be close to. In python it's,
def _binary_KL_divergence(p, p_hat):
"""
Computes the a real, KL divergence of two binomial distributions with
probabilities p and p_hat respectively.
"""
return (p * np.log(p / p_hat)) + ((1 - p) * np.log((1 - p) / (1 - p_hat)))
where p
is the constrained value, and p_hat
is the average activation value (or neuron value) of your samples. It is as simple as adding the term to the objective function. So, if the algorithm minimizes the square error ||H(X) - y||^2
, the new form would be ||H(X) - y||^2 + KL_divergence_term
.
As part of the cost function, it penalizes the average activations that diverge from p
whether higher or lower (Figure 1). How the weight updates depends on the partial differentiation of the new objective function.
(Figure 1 : KL-Divergence Cost when `p = 0.2)
In fact, I burrowed this idea from Sparse Auto-encoders, where more details can be seen at Lecture Notes on Sparse Autoencoders.
Good luck!