Using cached web data from Internet (Google Cache, Wayback Machine etc.)

https://stackoverflow.com/questions/13662895

04-12-2021
|

Question

I want to use Google Cache for visiting the webpages of other websites even without going at them.

If I fire a query like this http://webcache.googleusercontent.com/search?q=cache:<URL without SCHEME>, we can get the data.

I found/assume following things (Ques 0. please correct if any of them are wrong):

Google may or may not have cached information depending on the site's policy.
Google will anyways go to the website if any javascript has to be run.
Google just stores first 101 KB of the text.

Ques 1. I know Google cache only shows the recently crawled page but any idea of how old this data could be?

Ques 2. Is there any issue if I plan to go to Google cache for all the hits I make to that website (assuming that the website is cached and I am fine with little old page)?

Ques 3. Wayback Machine provides the data but it has huge delay between crawling and showing that data. Is there any directory where we can get recently archived data (like Wayback machine and Google cache)?

Solution

I know Google cache only shows the recently crawled page but any idea of how old this data could be?

Use the cache: operator in the URL

Is there any issue if I plan to go to Google cache for all the hits I make to that website (assuming that the website is cached and I am fine with little old page)?

Owners may request removal of content from the cache

Is there any directory where we can get recently archived data?

Use the tbs=qdr: query parameter in the URL

OTHER TIPS

For Question 3, while it used to be the case that all Wayback Machine web captures were 6 months old, that was already becoming untrue in 2012, and is very untrue now in 2016. We have a ton of fresh content.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow