Question

In webspiders/crawlers how can i get the actual initial rendered size of the font a user sees in an HTML document, keeping CSS in mind.

Was it helpful?

Solution

Rendered text size? A user can change the text size at will using his/her browser settings. Not to mention that different browsers render the same content slightly differently.

OTHER TIPS

If you are satisfied with the answer being for the 'default', no user customization view for this purpose (which seems likely), I believe you are looking at a fairly painful scenario:

  • Embed a rendering engine with CSS support in your spider. Prefer the use of an engine which matches most of your users, or alternatively use all three common engines and store the information for all of them. The ease of embedding varies widely on your consuming technology.

  • Load the URI being spidered in the rendering engine(s).

  • Using the engine's API, query it's font metrics for an element containing what you consider representative text (choosing this is an exercise for which I won't even begin to predict a strategy). How you access this will depend entirely on the embedding scenario for your engine.

I expect this is the 'hard way', but I'm not sure there is an 'easy' way.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top