Inspect the src
or href
attribute to see if it's absolute, relative, or protocol-relative (//stackoverflow.com/file
). Parse the page's URL. If the tag was protocol-relative, use the protocol from the parsed page URL, then append the content of the attribute. If it's relative, strip the query string and fragment IF from the original URL, and "append" the relative portion. Be aware that a relative URL can look like /foo
, foo
, foo/bar
, or ./../../bar/../foo
, so you might want to resolve path traversals before printing.
Edit:
Take a look at URL and the Commons URL Builder. They'll both be helpful.