سؤال


I would use the hosting for live testing, but I want to protect access and prevent search engine indexing. For example (server directory structure) within public_html:

_private
_bin
_cnf
_log
_ ... (more default directories hosting)
testpublic
css
images
index.html


I want index.html is visibile to everyone and all other directories (except "testpublic") are hidden, protected access and search engines not to index.

The directory "testpublic" I wish it was public but may not be indexed in search engines, not sure if this is possible.

To do understand that I need 2 files .htaccess.
One general in "public_html" and other specific for "testpublic".

The .htaccess general (public_html) I think it should be something like:

AuthUserFile /home/folder../.htpasswd
AuthName "test!"
AuthType Basic
require user admin123

< FilesMatch "index.html">
Satisfy Any
< / FilesMatch>


Can anyone help me create the files with the appropriate properties? Thank you!

هل كانت مفيدة؟

المحلول

You can use a robots.txt file in your root folder. All standards-abiding robots will obey this file and not index your files and folders.

Example Robots.txt that tells all (*) crawlers to move on and index nothing.

User-agent: *
Disallow: /

You could use .htaccess files to fine tune what your server (assuming Apache) serves out and what directory indexes are visible. In which case you would add

IndexIgnore *

To your .htaccess file to disallow indexes.

Updated (Credit to https://stackoverflow.com/users/1714715/samuel-cook):

If you want to specifically stop a bot/crawler and know its USER AGENT string you can do so in your .htaccess

<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteCond %{HTTP_USER_AGENT} Googlebot
  RewriteRule ^.* - [F,L]
</IfModule> 

Hope this helps.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top