Question

I'm using htaccess to rewrite and redirect www.mysite.com/index.php?id=# to friendly urls like www.mysite.com/news. So all news-articles will be written as www.mysite.com/news/article1, etc.

Now I'm blocking off all directories on my server that it doesn't need to index with robots.txt. Since I'm using a cms these are directories like /core, /managers, /connectors, etc. But since the www.mysite.com/news directory doesn't actually exist, but is rewritten with htaccess, will blocking off all the directories like /core, etc. still allow a crawler to index my website?

So basically what I want to know is: does a crawler see my website urls as they are after they're rewritten? Or does it still need access to the other directories of my cms, like /core to be able to index my pages?

Was it helpful?

Solution

No, the rewritten URL is an internal mapping process only. It is only used by your web server to determine how to treat the user-friendly URL it receives.

The same way the URL remains unchanged in a browser address bar, the process is invisible to the client, be it a web browser or a bot.


URL Rewriting is not to be confused with Redirection. In the latter case, a client request receives a "301 Redirect" response containing the URL where the actual resource resides. This results in a second request from the client to the redirected URL. Then by definition the client will be aware of this process.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top