The highlight from this blog on the kernel width choice:
To pick, say 1000 pairs (x,x’) at random from your dataset, compute the distance
of all such pairs and take the median, the 0.1 and the 0.9 quantile. Now pick λ
to be the inverse any of these three numbers. With a little bit of cross
validation you will figure out which one of the three is best. In most cases you
won’t need to search any further.
And this post from cross validated provides an analysis on the reason why such method works well. Basically changing the decision function for all or only one datapoint is avoided.
Besides, you may search "Heuristic method" on the parameter choice in SVM. For example, in M.Boardman et al's A Heuristic for Free Parameter Optimization with Support Vector Machines, the authors applied simulated annealing to improve parameter search efficiency compared to an exhaustive grid search.