Question

Today I stumbled upon a folder on my web host called 'error.log'. I thought I'd take a look.

I see multiple 'file does not exist' errors - there are three types of entries:

  • robots.txt
  • missing.html
  • apple-touch-icon-precomposed.png

I have some guesses about what these files are used for, but would like to know definitively:

  • What are the files in question?
  • Should I add them to my server?
  • What prompts an error log to be written for these? Is it someone explicitly requesting them? If so, who and how?
Was it helpful?

Solution

A robots.txt file is read by web crawlers/robots to allow/disallow it from scraping resources on your server. However, it's not mandatory for a robot to read this file, but the nice ones do. There are some further examples at http://en.wikipedia.org/wiki/Robots.txt An example file may look like and would reside in the web root directory:

User-agent: *   # All robots
Disallow: /     # Do not enter website

or

User-Agent: googlebot   # For this robot
Disallow: /something    # do not enter

The apple-touch-icon-precomposed.png is explained https://stackoverflow.com/a/12683605/722238

I believe the usage of missing.html is used by some as a customized 404 page. It's possible that a robot may be configured to scrape this file, hence the requests for it.

You should add a robots.txt file if you want to control the resources a robot will scrape off your server. As said before, it's not mandatory for a robot to read this file.

If you wanted to add the other two files to remove the error messages you could, however, I don't believe it is necessary. There is nothing to say that joe_random won't make a request on your server for /somerandomfile.txt in which case you will get another error message for another file that doesn't exist. You could then just redirect them to a customized 404 page.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top