I'd like to start using specific landing pages in a marketing campaign. A quick search on google shows how to disallow specific pages and/or directories using a robots.txt file. (link)

If I don't want the search engines to index these landing pages should I put a single page entries in the robot.txt file or should I put them in specific directories and disallow the directory?

My concern is that anybody can read a robots.txt file and if the actual page names are visible within the robots.txt file it defeats the purpose.

有帮助吗?

解决方案

"It defeats the purpose." How so? The purpose of robots.txt is to prevent crawlers from reading particular files or groups of files. Whether you exclude the individual files or put them all in a directory and exclude that directory is irrelevant as far as the crawler's behavior is concerned.

The benefit to putting them all in directories is that your robots.txt file is smaller and easier to manage. You don't have to add a new entry every time you create a new landing page.

You're right that putting a file name in robots.txt lets anybody who reads the file know that the file is there. That shouldn't be a problem. If you have sensitive information that you don't want others to see then it shouldn't be accessible, regardless of whether it's mentioned in robots.txt. Because if the file is publicly accessible, then a bot is going to find it even if you don't mention it in robots.txt.

robots.txt is just a guideline. The existence of a disallow line in robots.txt doesn't prevent an unfriendly crawler from looking at those pages. It just tells the crawler that you don't want them looking at those pages. But crawlers can ignore robots.txt. They shouldn't, and you can block them if they do, but robots.txt itself is more like a stop sign than a road block.

其他提示

You should be able to simply use the NOINDEX META tag in the HEAD of your page.

http://www.robotstxt.org/meta.html

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top