On scaling tags in a tag cloud
-
19-09-2019 - |
Question
I am implementing a tag cloud on a mobile device. The details of data-model etc, are not particularly important here. My question is about the scaling of tags:
What is the 'best' expression to map tag frequency to font size?
I have looked at this post discussing linear and logarithmic scaling and this answer from Adrian Kuhn sketch of a polynomial approach for inspiration. However, I seem to remember a post some place on the interwebs with a lot more exploration on this issue.
I have also found some "best practices" on a blog, though am unsure of the providence of the best practices. These make no comment on frequency scaling.
What alternatives do I have for tag scaling, and which is the preferred/standard method? I am also considering minimum fontsizes, maximum number of tags, colors, etc.
Edit: As per the discussion in this question, I am interested in the "standard" tagcloud, with font size variations.
Solution
I worked on a small tag cloud project last year, in which I used something along the lines of
β = (int) (((maxθ – minθ) x ω) + minθ + 0.5)where ω is a weighting previously calculated according to some metric (in your case font frequency), minθ and maxθ are lower and upper bounds, and β is the final value. This can be applied to any visual characteristic (font size, colour, weight if supported, etc.).
I found that linear and logarithmic scaling tended to dependant on data set distribution. In data sets with prominent outliers I found tanh was useful for 'smoothing' the resulting values.
OTHER TIPS
There is an excellent discussion in this pdf, which discusses scaling, clustering, and truncating on the tags to display.
A solution I found works nicely is as follows
font_size = (max_font_size - min_font_size) * (Math.sin(1.5*(X)) + minsize
where X is the normalized value you wish to map onto font size
X = (this_value-min_value)/(max_value-min_value)
this increases size differential for the lower 3 quartiles of the range, such as to minimize the effect of high outliers