Question

We had a situation where all of our page links were crawled and continue to be crawled. The page links contain "~/{someTerm}/{someOtherTerm}/__p/##/##".

The problem is that now both Google and MSN bots are crawling tens of thousands of pages that don't need to be crawled and causing a strain on the system.

So we changed the paging link to a Javascript link, and removed all URL's containing "__p" so they will now return a 404 - Page Not Found. We only really want page 1 indexed, and maybe a page or two thereafter (but not worried about that now.

Is there a way to remove all pages containing "__p" in the URL using WebMasterTools for Google and MSNBot, and if so, how?

Thanks.

Was it helpful?

Solution

I think you should use a <meta> tag in those pages you'd like to remove from search engines.

<meta name="robots" content="noindex, nofollow" />

Also, you can try out using robots.txt exclusion, look at this site

User-agent: *
Disallow: /*___p
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top