문제

I am very new at BI and BD but want to have some directions on the following. When I want to classify "good" or "best" links then I could use like counts from Facebook or retweet counts from twitter. But some communities have large user bases so their links get much more likes or retweets. How can I "normalize" these huge community likes with for example the likes of a similar news item link of a much smaller community who have much lesser like count?

Is this called normalizing by the way? And in what kind of books can i learn these kinds of algorithms about "quality" (in this case of an article for example)? What is it called anyhow what I am trying to do?

Thnx.

도움이 되었습니까?

해결책 2

Your could try this linear regression:

Quality_of_link = alfa + B1*Number_of_links + B2*User_base + error term.

To determine parameters (B1 and B2) for Dependent Variables (Number_of_links, User_base) you could use historical data (number_of_links; user_base; quality of link) and estimate the values of the parameters by running a linear regressions. You could do this in a statical program. Good statical programs include R-project and SPSS.

Important in this respect is the objective way to determine the Quality_of_link. I think you could do a test by rating a number of links, preferable by the targeted audience of you site. Then use the average value given on a scale (e.g. 0-100) to the links.

After you have run the regression in you test phase your can use it in you final model. This would then be: Quality_of_link = alfa + B1*Number_of_links + B2*User_base. You could then use say above a Quility_of_link above 70 is a good link and higher than 90 best link.

For good textbooks it will be difficult to point you to a particular book which I haven't read myself. I would first recommend using the knowledge you already have an use the internet if some knowledge needs to be refreshed.

Hope this helps. Success with your project.

다른 팁

Yes, it is called Normalization or Standardization.

You could calculate the Z-Score† of the number of "likes" of an article, so that the comparison is fair. The Z-Score is the number of standard deviations that a value is above the mean.

You can probably get some better advice on https://stats.stackexchange.com/

Good luck!

† If you are sampling, you should use the T-statistic instead.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top