How to decide activation function in neural network

Question 1

If you are seeking to reduce output error, there are a couple of things to look at before tweaking a node's activation function.

First, do you have a bias node? Bias nodes have several implications, but - most relevant to this discussion - they allow the network output to be translated to the desired output range. As this reference states:

The use of biases in a neural network increases the capacity of the network to solve problems by allowing the hyperplanes that separate individual classes to be offset for superior positioning.

This post provides a very good discussion: Role of Bias in Neural Networks. This one is good, too: Why the BIAS is necessary in ANN? Should we have separate BIAS for each layer?

Second method: it often helps to normalize your inputs and outputs. As you note, your sigmoid offers a range of +/- 1. This small range can be problematic when trying to learn functions that have a range of 0 to 1000 (for example). To aid learning, it's common to scale and translate inputs to accommodate the node activation functions. In this example, one might divide the range by 500, yielding a 0 to 2 range, and then subtract 1 from this range. In this manner, the inputs have been normalized to a range of -1 to 1, which better fits the activation function. Note that network output should be denormalized: first, add +1 to the output, then multiply by 500.

In your case, you might consider scaling the inputs by 0.8, then subtracting 1 from the result. You would then add 1 to the network output, and then multiply by 1.25 to recover the desired range. Note that this method may be easiest to accomplish since it does not directly change your network topology like the addition of bias would.

Finally, have you experimented with changing the number of hidden nodes? Although I believe the first two options are better candidates for improving performance, you might give this one a try. (Just as a point of reference, I can't recall an instance in which modifying the activation function's shape improved network response more than option 1 and 2.)

Here are some good discussion of hidden layer/node configuration: multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer? How to choose number of hidden layers and nodes in neural network?

24 inputs make your problem a high-dimensional one. Ensure that your training dataset adequately covers the input state space, and ensure that you are your test data and training data are drawn from similarly representative populations. (Take a look at the "cross-validation" discussions when training neural networks).

Question 2

The vanilla sigmoid function is:

def sigmoid(x):
    return 1/(1+math.e**-x)

You could transform that to:

def mySigmoid(x):
    return 2.5/(1+math.e**-x)

in order to make the transformation that you want