On scaling tags in a tag cloud

https://stackoverflow.com/questions/1327594

19-09-2019
|

Question

I am implementing a tag cloud on a mobile device. The details of data-model etc, are not particularly important here. My question is about the scaling of tags:

What is the 'best' expression to map tag frequency to font size?

I have looked at this post discussing linear and logarithmic scaling and this answer from Adrian Kuhn sketch of a polynomial approach for inspiration. However, I seem to remember a post some place on the interwebs with a lot more exploration on this issue.

I have also found some "best practices" on a blog, though am unsure of the providence of the best practices. These make no comment on frequency scaling.

What alternatives do I have for tag scaling, and which is the preferred/standard method? I am also considering minimum fontsizes, maximum number of tags, colors, etc.

Edit: As per the discussion in this question, I am interested in the "standard" tagcloud, with font size variations.

Solution

I worked on a small tag cloud project last year, in which I used something along the lines of

β = (int) (((maxθ – minθ) x ω) + minθ + 0.5)

where ω is a weighting previously calculated according to some metric (in your case font frequency), minθ and maxθ are lower and upper bounds, and β is the final value. This can be applied to any visual characteristic (font size, colour, weight if supported, etc.).

I found that linear and logarithmic scaling tended to dependant on data set distribution. In data sets with prominent outliers I found tanh was useful for 'smoothing' the resulting values.

OTHER TIPS

There is an excellent discussion in this pdf, which discusses scaling, clustering, and truncating on the tags to display.

A solution I found works nicely is as follows

font_size = (max_font_size - min_font_size) * (Math.sin(1.5*(X)) + minsize

where X is the normalized value you wish to map onto font size

X = (this_value-min_value)/(max_value-min_value)

this increases size differential for the lower 3 quartiles of the range, such as to minimize the effect of high outliers

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow