Question

I am using Tomcat 5.5 server and there is a web application deployed. I want to block http requests which access the .txt files in my project. For example http urls like -- https ://MyDomain/inside/mytest.txt

I think this can be done using security-constraints in web.xml file or writing a custom valve. However, I am exploring the possibility of using robots.txt file as they seem very simple. So I have written a robots.txt file to block access to *.txt file as follows --

# go away User-agent: * Disallow: /*.txt

I have put it in the ROOT folder and also in all paths inside the webapps folder. However, it does-nt seem to have any effect and I am still able to access the *.txt files. Are there any other caveats and steps required for the robots.txt file to take affect in Tomcat? Any help here highly appreciated.

Was it helpful?

Solution

robots.txt is a convention that is followed by Web Robots (such as search engine web crawlers), and it works because the Web Robot looks at this file and obeys the rules it finds in there. See http://www.robotstxt.org/ and particularly http://www.robotstxt.org/robotstxt.html for more information.

Note that it is not intended as a security measure - it is an advisory protocol only, and any badly-behaved Web Robot is free to ignore it (as the server does not enforce its rules). In fact, robots.txt is not even aimed at human users, who are normally expected to be able to visit these pages.

If you really want to block access to a file by all users of your web application (not just well-behaved automated web crawlers) then you will need to do something like security-constraints you mentioned.

Another couple of alternatives:

  • Put the text files in the WEB-INF/classes/ folder of the web application as the contents of this folder are never served to web clients, but are available to the web application code itself using ClassLoader.getResourceAsStream for example.

  • Put the text files in the WEB-INF/ folder of the web application as the contents of this folder are never served to web clients, but are available to the web application code itself using ServletContext.getResourceAsStream for example.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top