After playing with this for a very long time I ended up doing the following:
- Rewrite the HTML and JS files on the fly. All other resources are hosted by the original website.
- For HTML files, inject a
<base>
tag, pointing to the website being redirected. This will cause the browser to automatically redirect relative links (in the HTML file, CSS files, and even Flash!) to the original website. - For the JS files, apply a regular expression to patch specific sections of code that point to the wrote URL. I load up the redirected page in a browser, look for broken links, and figure out which section of JS needs to be patched to correct the problem.
This sounds a lot harder than it actually is. On average, patching each page takes less than 5 minutes of work.
The big discovery was the <base>
tag! It corrected the vast majority of links on my behalf.