How to find out the name of the default page displayed by a webserver?

https://stackoverflow.com/questions/5300937

22-10-2019
|

Question

I'm downloading various files through I/O-streaming in my Java application. Receiving and saving those files works well as long as I have a full URL-path including file name, but how can I find out the name of the index file (as defined in, for example, Apache's DirectoryIndex) of a domain? The HTTP header doesn't provide this information and neither does the URLConnection method.

Thanks alot!

Be well
S.

Solution

As far as I know there is no way of retrieving this information. The HTTP specification doesn't provide it, and I think this isn't a bad thing. Your clients requests the URL "/", it's up to the web server how to handle that, there is no obligation to return a filename too.

It's also worth pointing out (I'm sure you're aware of it but just in case) that just because a URL looks like /somedir/somefile.html, it doesn't mean that is the actual file being served. It could be being served via a proxy to another host, mod_rewrite etc - in other words, the name as arbitrary and doesn't necessarily bear any relation to the physical name on disk.

In short, I think your best bet would be to pick a default filename e.g. index.html for those cases and stick to it.

OTHER TIPS

Only way out is to:

Inspect Content-Disposition header and use it to generate filename. If server is serving a file, it would set this header. E.g. http://server:port/DownLoadServlet URL might set this header to indicate name as "statement.pdf".
IF this header is missing, use Heuristics to generate filename. This is what browsers do to generate filenames like Doc[10].pdf Doc[12].pdf etc.
Use content-type header (if available) to guess file extension.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow