Question

I've been trying to a find a statistics-esque formula for calculating the rate of change for html tags which are either added or removed from various websites.

So, for example, with the scraper I'm writing, I obtain the initial tag count and then cache that value. Later, on the next round, I compare the current tag count obtained with the past tag count, and calculate a percentage based on the differences between the two in terms of rate of change.

Other factors are included here, such as the number of times the website has been scraped, as well the dates these scrapes occur, etc.

What would be the ideal formula for something of this nature?

Was it helpful?

Solution

counting tags is ok, additionally you may look for table trees or div trees and their depth.

for ex,

<div>
  <div>
    <div> .. </div>
  </div>
</div>
depth is 3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top