Question

I am very new at BI and BD but want to have some directions on the following. When I want to classify "good" or "best" links then I could use like counts from Facebook or retweet counts from twitter. But some communities have large user bases so their links get much more likes or retweets. How can I "normalize" these huge community likes with for example the likes of a similar news item link of a much smaller community who have much lesser like count?

Is this called normalizing by the way? And in what kind of books can i learn these kinds of algorithms about "quality" (in this case of an article for example)? What is it called anyhow what I am trying to do?

Thnx.

Was it helpful?

Solution 2

Your could try this linear regression:

Quality_of_link = alfa + B1*Number_of_links + B2*User_base + error term.

To determine parameters (B1 and B2) for Dependent Variables (Number_of_links, User_base) you could use historical data (number_of_links; user_base; quality of link) and estimate the values of the parameters by running a linear regressions. You could do this in a statical program. Good statical programs include R-project and SPSS.

Important in this respect is the objective way to determine the Quality_of_link. I think you could do a test by rating a number of links, preferable by the targeted audience of you site. Then use the average value given on a scale (e.g. 0-100) to the links.

After you have run the regression in you test phase your can use it in you final model. This would then be: Quality_of_link = alfa + B1*Number_of_links + B2*User_base. You could then use say above a Quility_of_link above 70 is a good link and higher than 90 best link.

For good textbooks it will be difficult to point you to a particular book which I haven't read myself. I would first recommend using the knowledge you already have an use the internet if some knowledge needs to be refreshed.

Hope this helps. Success with your project.

OTHER TIPS

Yes, it is called Normalization or Standardization.

You could calculate the Z-Score† of the number of "likes" of an article, so that the comparison is fair. The Z-Score is the number of standard deviations that a value is above the mean.

You can probably get some better advice on https://stats.stackexchange.com/

Good luck!

† If you are sampling, you should use the T-statistic instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top