Question

I have a set of news articles for which there are stats, eg.: number of twitter posts mentioning the article for range of days. Natural behavior of stats values is that the number of new posts grows fast and then decreases as the news ages.

I would like to know how to calculate the number of days after which the changes to stats are not significant anymore (eg.: <0.1% of total posts) for the whole set of data with some confidence level.

Could you give some hints where to look for information and methods? I'd appreciate some code sample in Python too :)

Was it helpful?

Solution

This question is really about time-series analysis. Since you are interested in determining the cut-off point, a good place to start would be by reading up on Control Charts. If you want to delve deeper into the statistics (beyond control charts), then look into Change Point Analysis, and also look up Structural changes in time-series.

Python Modules: To perform this analysis in Python, NumPy and pandas modules are relevant. This post in statalgo will get you on the right track in terms of Python code. (If you are open to using R for your analysis, consider the CRAN packages tseries and strucchange.)

Relavant Question in SE (stats): How to detect a change in time series data?

Pertinent Real life example: In the aftermath of Osama Bin Laden's death, there was a good deal of analysis done on how that piece of news spread on Twitter. The article even has a section specifically related to your question about the stop of the spread of the news.

Finally, you might also consider asking this in the Stats SE site.

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top