سؤال

I have a page that has a form using this ajaxForm jQuery plugin. The form submits, and when it's complete, there is a call using $.get() to load some new content to the page.

My problem is, the Googlebot "appears" to be indexing the url in the $.get() method.

My first question is, is that even possible? I was under the impression the Googlebot didn't evaluate javascript for the most part (I read something about it being able to index content on urls with !#).

My second question is, if Google is indexing this call to that url, is there a way to prevent it?

Thanks in advance.

هل كانت مفيدة؟

المحلول

You could robots.txt the file specifically, googlebot will should honor it.

From robotstxt.org:

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

You can also look at Google's Webmaster Central to remove the file from the listing.

نصائح أخرى

First of all you need to check that that is really the GoogleBot because anyone can pretend being GoogleBot, even a legitimate user.

The recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name.

Sourced from Official Google Webmaster Central Blog: How to verify Googlebot.

googlebot interprets pretty much every string in inline-javascript as an URL that contains a "/" or a common file extenstion (".html", ".php") ... especially the first one is very very annoying.

confuscate every URL in inline JS that you do not want to get crawled. i.e.: replace "/" with '|' on the server side and make a wrapper method in JS that replaces "|" to "/" again.

yes, thats majorly annoying and there are better ways i.e.: having all your js in an external file that is not crawlable.

the robots.txt solution is not really a solution. because the URLs still get found, pushed to discovery (the pipe google uses to determine what to crawl next) but then the crawling is blocked, which is basically one missed opportunity.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top