I highly doubt that any strict rules exist for your problem. First of all, limits/bounds of weights are strictly dependant on your input data representation, activation functions, neurons number and output function. what you can rely on here are rules of the thumb in the best possible scenario.
First, lets consider the initial weights values in classical algorithms. Some basic idea of the weights scale are to use them in the range of [-1,1]
for small layers, and for large ones divide it by the square root of the number of units in the large layer. More sophisticated methods are described by Bishop (1995). With such rule of the thumb we could deduce, that a resonable range (which is simply row of magniture bigger then the initial guess) would be something in the form of [-10,10]/sqrt(neurons_count_in_the_lower_layer)
.
Unfortunately, to my best knowledge, temperature choice is much more complex, as it is rather a data dependant factor, not just topology based one. In some papers there have been suggestions for some values for some specific time series prediction, but nothing general. In simmulated annleaing "in general" (not just applied to NN training), there have been proposed many heuristic choices, ie.
If we know the maximum distance (cost function difference) between one neighbour and another then we can use this information to calculate a starting temperature. Another method, suggested in (13. Rayward-Smith, V.J., Osman, I.H., Reeves, C.R., Smith, G.D. 1996. Modern Heuristic Search Methods. John Wiley & Sons.), is to start with a very high temperature and cool it rapidly until about 60% of worst solutions are being accepted. This forms the real starting temperature and it can now be cooled more slowly. A similar idea, suggested in (5. Dowsland, K.A. 1995. Simulated Annealing. In Modern Heuristic Techniques for Combinatorial Problems (ed. Reeves, C.R.), McGraw-Hill, 1995), is to rapidly heat the system until a certain proportion of worse solutions are accepted and then slow cooling can start. This can be seen to be similar to how physical annealing works in that the material is heated until it is liquid and then cooling begins (i.e. once the material is a liquid it is pointless carrying on heating it). [from notes from University of Nottingham]
But the choice of the best for your application has to be based on numerous tests, as most of the things in the machine learning. If you are dealing with the problem, where you are really concerned about well trained neural network, it seems resonable to interest in Extreme Machine Learning, and Extreme Learning Machines (ELM), where the neural network training is conducted in the global optimization procedure, which guarantees the best possible solution (under used regularized cost function). Simulated annleaing, as a interative, greedy process (as well as back propagation) cannot guarantee anything, there are only heuristics and rules of thumb.