Question

I am working on Web application, which allows users to create their own webapp in turn. For each new webapp created by my application I Assign a new Subdomain. e.g. subdomain1.xyzdomain.com, subdomain2.xyzdomain.com etc.

All these Webapps are stored in Database and are served by a python script (say default_script.py) kept in /var/www/. Till now, I have blocked Search Engine indexing for directory ( /var/www/ ) using robots.txt. Which essentially blocks indexing of my all scripts including default_script.py as well as content served for multiple webapps using that default_script.py script.

But now I want that some of those subdomains should be indexed.

After searching for a while I was able to figure out a way to block indexing of my scripts by explicitly specifing them in robots.txt

But I am still doubtful about the following:

  1. Will blocking the my default_script.py from indexing also block indexing of all content that are served from default_script.py. If yes then if I let it index, will default_script.py start showing up in search results also.

  2. How can I allow indexing of some of the Subdomains seletively.

    Ex: Index subdomain1.xyzdomain.com but NOT subdomain2.xyzdomain.com

Was it helpful?

Solution

No. The search engine should not care what script generates the pages. Just so long as the pages generated by the webapps are indexed you should be fine.

Second question:

You should create a separate robots.txt per subdomain. That is when robots.txt is fetched from a particular subdomain, return a robots.txt file that pertains to that sudomain only. So if you want the subdomain indexed, has that robots file allow all. If you don't want it indexed, have the robots file deny all.

OTHER TIPS

So to summarize the discussion,

This is how my .htaccess file looks which is kept in /var/www/ directory:

Options +FollowSymlinks
RewriteEngine On
RewriteBase /

# Rule Below allows using different robots.txt for subdomain1.
RewriteCond     %{HTTP_HOST}           ^subdomain1.xyzdomain.com$ [NC]
RewriteRule     ^(.*)robots.txt        subdomain1-robots.txt [L]

# This rule is applicable on rest of subdomains and xyzdomain.com.
RewriteRule     ^robots.txt$           robots.txt [L]

# This rule allow serving content from default_script.py for files other then robots.txt
RewriteRule     .                      default_script.py
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top