Why move your Javascript files to a different main domain that you also own?

https://stackoverflow.com/questions/160376

03-07-2019
|

Question

I've noticed that just in the last year or so, many major websites have made the same change to the way their pages are structured. Each has moved their Javascript files from being hosted on the same domain as the page itself (or a subdomain of that), to being hosted on a differently named domain.

It's not simply parallelization

Now, there is a well known technique of spreading the components of your page across multiple domains to parallelize downloading. Yahoo recommends it as do many others. For instance, www.example.com is where your HTML is hosted, then you put images on images.example.com and javascripts on scripts.example.com. This gets around the fact that most browsers limit the number of simultaneous connections per server in order to be good net citizens.

The above is not what I am talking about.

It's not simply redirection to a content delivery network (or maybe it is--see bottom of question)

What I am talking about is hosting Javascripts specifically on an entirely different domain. Let me be specific. Just in the last year or so I've noticed that:

youtube.com has moved its .JS files to ytimg.com

cnn.com has moved its .JS files to cdn.turner.com

weather.com has moved its .JS files to j.imwx.com

Now, I know about content delivery networks like Akamai who specialize in outsourcing this for large websites. (The name "cdn" in Turner's special domain clues us in to the importance of this concept here).

But note with these examples, each site has its own specifically registered domain for this purpose, and its not the domain of a content delivery network or other infrastructure provider. In fact, if you try to load the home page off most of these script domains, they usually redirect back to the main domain of the company. And if you reverse lookup the IPs involved, they sometimes appear point to a CDN company's servers, sometimes not.

Why do I care?

Having formerly worked at two different security companies, I have been made paranoid of malicious Javascripts.

As a result, I follow the practice of whitelisting sites that I will allow Javascript (and other active content such as Java) to run on. As a result, to make a site like cnn.com work properly, I have to manually put cnn.com into a list. It's a pain in the behind, but I prefer it over the alternative.

When folks used things like scripts.cnn.com to parallelize, that worked fine with appropriate wildcarding. And when folks used subdomains off the CDN company domains, I could just permit the CDN company's main domain with a wildcard in front as well and kill many birds with one stone (such as *.edgesuite.net and *.akamai.com).

Now I have discovered that (as of 2008) this is not enough. Now I have to poke around in the source code of a page I want to whitelist, and figure out what "secret" domain (or domains) that site is using to store their Javascripts on. In some cases I've found I have to permit three different domains to make a site work.

Why did all these major sites start doing this?

EDIT: OK as "onebyone" pointed out, it does appear to be related to CDN delivery of content. So let me modify the question slightly based on his research...

Why is weather.com using j.imwx.com instead of twc.vo.llnwd.net?

Why is youtube.com using s.ytimg.com instead of static.cache.l.google.com?

There has to a reasoning behind this.

Solution

Your follow-up question is essentially: Assuming a popular website is using a CDN, why would they use their own TLD like imwx.com instead of a subdomain (static.weather.com) or the CDN's domain?

Well, the reason for using a domain they control versus the CDN's domain is that they retain control -- they could potentially even change CDNs entirely and only have to change a DNS record, versus having to update links in 1000s of pages/applications.

So, why use nonsense domain names? Well, a big thing with helper files like .js and .css is that you want them to be cached downstream by proxies and people's browsers as much as possible. If a person hits gmail.com and all the .js is loaded out of their browser cache, the site appears much snappier to them, and it also saves bandwidth on the server end (everybody wins). The problem is that once you send HTTP headers for really aggressive caching (i.e. cache me for a week or a year or forever), these files aren't ever reliably loaded from the server any more and you can't make changes/fixes to them because things will break in people's browsers.

So, what companies have to do is stage these changes and actually change the URLs of all of these files to force people's browsers to reload them. Cycling through domains like "a.imwx.com", "b.imwx.com" etc. is how this gets done.

By using a nonsense domain name, the Javascript developers and their Javascript sysadmin/CDN liaison counterparts can have their own domain name/DNS that they're pushing these changes through, that they're accountable/autonomous for.

Then, if any sort of cookie-blocking or script-blocking starts happening on the TLD, they just change from one nonsense TLD to kyxmlek.com or whatever. They don't have to worry about accidentally doing something evil that has countermeasure side effects on all of *.google.com.

OTHER TIPS

Limit cookie traffic?

After a cookie is set on a specific domain, every request to that domain will have the cookie sent back to the server. Every request!

That can add up quickly.

Lots of reasons:

CDN - a different dns name makes it easier to shift static assets to a content distribution network

Parallelism - images, stylesheets, and static javascript are using two other connections which are not going to block other requests, like ajax callbacks or dynamic images

Cookie traffic - exactly correct - especially with sites that have a habit of storing far more than a simple session id in cookies

Load shaping - even without a CDN there are still good reasons to host the static assets on fewer web servers optimized to respond extremely quickly to a huge number of file url requests, while the rest of the site is hosted on a larger number of servers responding to more processor intensive dynamic requests

update - two reasons you don't use the CDN's dns name. The client dns name acts as a key to the proper "hive" of assets the CDN is caching. Also since your CDN is a commodity service you can change the provider by altering the dns record - so you can avoid any page changes, reconfiguration, or redeployment on your site.

I think there's something in the CDN theory:

For example:

$ host j.imwx.com
j.imwx.com              CNAME   twc.vo.llnwd.net
twc.vo.llnwd.net        A       87.248.211.218
twc.vo.llnwd.net        A       87.248.211.219
$ whois llnwd.net
<snip ...>
Registrant:
  Limelight Networks Inc.
  2220 W. 14th Street
  Tempe, Arizona 85281-6945
  United States

Limelight is a CDN.

Meanwhile:

$ host s.ytimg.com
s.ytimg.com             CNAME   static.cache.l.google.com
static.cache.l.google.com       A       74.125.100.97

I'm guessing that this is a CDN for static content run internally by Google.

$ host cdn.turner.com
cdn.turner.com A record currently not present

Ah well, can't win 'em all.

By the way, if you use Firefox with the NoScript add-on then it will automate the process of hunting through source, and GUI-fy the process of whitelisting. Basically, click on the NoScript icon in the status bar, you're given a list of domains with options to temporarily or permanently whitelist, including "all on this page".

I implemented this solution about two to three years ago at a previous employer, when the website started getting overloaded due to a legacy web server implementation. By moving the CSS and layout images off to an Apache server, we reduced the load on the main server and increased the speed no end.

However, I've always been under the impression that Javascript functions can only be accessed from within the same domain as the page itself. Newer websites don't seem to have this limitation: as you mention, many have Javascript files on separate sub-domains or even completely detached domains altogether.

Can anyone give me a pointer on why this is now possible, when it wasn't a couple of years ago?

It's not just javascript that you can move to different domains but as many assets as possible will yield performance improvements.

Most browsers have a limit to the number of simultanious connections you can make to a single domain (I think it's around 4) so when you have a lot of images, js, css, etc theres often hold up in downloading each file.

You can use something like YSlow and FireBug to view when each file is downloaded from the server.

By having assets on separate domains you lessen the load on your primary and can have more simultanious connections and download more files at any given time.

We recently launched a realestate website which has a lot of images (of the houses, duh :P) which uses this principle for the images, so it's a lot faster to list the data.

We've also used this on many other websites which have high asset volumne.

I think you answered your own question.

I believe your issue is security-related, rather than WHY.

Perhaps a new META tag is in order that would describe valid CDNs for the page in question, then all we need is a browser add-on to read them and behave accordingly.

Would it be because of blocking done by spam and content filters? If they use weird domains then it's harder to figure out and/or you'll end up blocking something you want.

Dunno, just a thought.

If I were a big name, multi-brand company, I think this approach would make sense because you want to make the javascript code available as a library. I would want to make as many pages be as consistent as possible in handling things like addresses, state names, zip codes. AJAX probably makes this concern prominent.

In the current internet business model, domains are brands, not network names. If you get bought or spin-off brands, you end up with a lot of domain changes. This is a problem for even the most prominent sites.

There are still links that point to to useful documents in *.netscape.com and *.mcom.com that are long gone.

Wikipedia for Netscape says:

"On October 12, 2004, the popular developer website Netscape DevEdge was shut down by AOL. DevEdge was an important resource for Internet-related technologies, maintaining definitive documentation on the Netscape browser, documentation on associated technologies like HTML and JavaScript, and popular articles written by industry and technology leaders such as Danny Goodman. Some content from DevEdge has been republished at the Mozilla website."

So, that would be, in less than a 10 year period:

Mosaic Communications Corporation
Netscape Communications Corporation
AOL
AOL Time Warner
Time Warner

If you put the code in a domain that is NOT a brand name, you retain a lot of flexibility and you don't have to refactor all the entry points, access control, and code references when the web sites are re-named.

I have worked with a company that does this. They're in a datacenter w/ fairly good peering, so the CDN reasoning isn't as big for them (maybe it would help, but they don't do it for that reason). Their reason is that they run several webservers in parallel which collectively handle their dynamic pages (PHP scripts), and they serve images and some javascript off of a separate domain on which they use a fast, lightweight webserver such as lighttpd or thttpd to serve up images and static javascript.

PHP requires PHP. Static Javascript and images do not. A lot can be stripped out of a full featured webserver when all you need to do is the absolute minimum.

Sure, they could probably use a proxy that redirects requests to a specific subdirectory to a different server, but it's easier to just handle all the static content with a different server.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow