Question

I was wondering if there is any research or study made to calculate the volume of space is used by all scientific articles. It could be in pdf, txt, compressed, or any other format. Is there even a way to measure it?

Can some one point me towards realizing this study?

Regards and thanks.

Was it helpful?

Solution

Perhaps you are looking to quantify the amount of filespace used by a specific subset of data that we will label as "academic publications."

Well, to estimate, you could find stats on how many publications are housed at all the leading libraries (JSTOR, EBSCO, AcademicHost, etc) and then get the mean average size of each. Multiply that by the number of articles and whamo, you've got yourself an estimate.

Here's the problem, though: PDF files store the text from string s differently (in size) than, say, a text document stores that same string. Likewise, a compressed JPEG will store an amount of information i differently than a non-compressed JPEG. So you see we could have two of the same articles containing the same information i but taking up different amounts of memory m.

Are you looking to get a wordcount on the amount of scientific literature?

Are you looking to get an approximation of file system space used to store all academically published content in the world?

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top