Development Of Scriptable Webpage Served As XML And Parsed As HTML

https://softwareengineering.stackexchange.com/questions/352369

15-01-2021
|

Question

In short, this question is about the cross-browser-compatibility-ness of a way to serve a HTML page as XML. I am interested in serving a web page as XML to be interpreted as HTML for increased CSS performance by removing the user agent style sheet. The reason for this concern will not be explained as it already has caused this question to go sideways. I found this webpage containing what I was looking for: a way to disable the user agent stylesheet. Except, that web page failed to give anything regarding browser compatibility. SO here are my questions about this.

That XML webpage (same as the one above) appears to successfully disable the user stylesheet by making the browser load an XML file as if it were a web page. What is the common term that describes what this web page is doing is called (a word(s) that could be used in a Google search to learn more about it)?
What is the browser compatibility (which web browsers will load the page correctly) for loading webpages in this way without the user agent stylesheet? Also, what about javascript? Will the behavior of Javascript in pages like this stay consistent across browsers? So far, the differences I have found (using javascript executed in the dev console)between Javascript in normal pages, and javascript in this page to be the following (in Chrome).
- By default, all elements will have Element as their constructor. This causes (among other things):
  - onmoveover, onmouseout, onclick, and other mouse events are undefined. Instead, addeventlistener must be used.
  - document.body to be null.
  - offsetparent to be undefined.
  - clienttop and clientleft to be undefined.
- document.doctype to be null.
- The documents constructor to be XMLDocument
Are there any other differences in javascript? How does the behavior of javascript differ from browser to browser?

I do not want to rant, but at the same time I do not want my answer to be falsely marked as off question. So, I will share this one bit of my attempted reasearch on it. After hours of searching, I found what is called XSLT. However, it doesn't appear to be what I'm looking for because in the demo pages of XSLT I found, I saw that the user agent stylesheet was still being applied.

Solution

How To Make A XHTML Page

To answer the question, everything about the linked page is off-bat. There is nothing cross-browser about it (though it does work fine in most browsers coincidentally). Expect change and decreasing browser support. A much better way to go is XHTML. XHTML is XML parsed as HTML which is exactly what you are looking for. XHTML has none of the unique HTML features that prohibit performance and produce strange error-prone parsing behaviors. Do the following steps to get an XHTML template page from which you can work off.

Begin the XHTML page with <?xml version="1.0" encoding="UTF-8"?> for UTF-8. If you are certain that only ascii characters will be on your website, then you can try <?xml version="1.0" encoding="ISO-8859-1"?> to increase page parse speed by a few microseconds.
Next, follow with <!DOCTYPE html> (this is case-sensitive in XHTML)
Now, add in the <html> element and be sure to wrap it around the entire page. Do not forget to assign the xmlns="http://www.w3.org/1999/xhtml" to the html tag for W3 conformance. As for Internet Explorer conformance, I would recommend adding on a xml:lang="en" attribute to prevent older versions of IE from going into quirks mode.
In the <head>, you must then add <meta http-equiv="Content-Type" content="application/xhtml+xml;charset=UTF-8" /> to reaffirm that this is XHTML (silly, right?)
Because the XHTML spec demands it, you should also put a <title>Title Of The Webpage</title> tag somewhere in the <head>. If you do not want a title for your webpage, then put <title></title> in your <head> instead.
Make sure Content-Type: application/xhtml+xml is in the headers served with this webpage. If you do not know what this means then read the following. First, I would try putting a .xhtml extension on the webpage. Then, to see whether the page is actually XHTML, first open the page in your browser (the browser must not be IE because it can get really wonky), then open the Developer Tools console (usually CTRL+SHIFT+I, or right click anywhere on the page and click inspect and switch from the Elements tab to the Console tab), then put your cursor in the textbox at the top by clicking to the right of the blue arrow, then copy&paste or type-in document.contentType into the text box, then press enter to see whether or not the page is XHTML. The image below results from performing these steps on example.com. As seen above, the content type of example.com is "text/html". This means that example.com is not in XHTML. If example.com were XHTML, then the outputted text would be "application/xml+xhtml". If changing the file extension to .xhtml fails to make the page XHTML, then check if your server supports PHP. If your server supports PHP, then an incredibly crude, awful, and abusive (but still 100% working) way to get XHTML is to put <?php header("Content-Type", "application/xhtml+xml"); ?>at the very very absolute top of your file before everything else (even before the <?xml version="1.0" encoding="UTF-8"?>). If all else fails, then try changing the file extension to .xml. I know that (in chrome at least) xml files will get interpreted as XHTML files if done exactly correctly without a single flaw in steps 1-4.

Your resulting XHTML web page code should something like the following. (Special note to avoid a particular splitting headache: Microsoft browsers only scan ~4kb into the file for <meta http-equiv="X-UA-Compatible" content="IE=Edge" />. Do not place this tag down too far down the page or you will be sorry).

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <meta http-equiv="Content-Type" content="application/xhtml+xml;charset=UTF-8" />
        <title>Title Of The Webpage</title>
        <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    </head>
    <body>
        The body content goes here
    </body>
</html>

Some quirks about XHTML

There are some very notable differences between XHTML and loose HTML. These differences contribute to why XHTML is so much faster to parse, and why it is (for me at least) so much better for development in some cases, but not others.

Tags are written differently: <script async src="..."> is invalid. Instead, you must do <script async="" src="..."></script>.
1. All tags must be closed.
2. All attributes must have values. To create a blank attribute, set its value to "" like so: <iframe sandbox=""></iframe>.
3. All tags in XHTML are allowed to be self closing (e.x. <tag /> is a 100% interchangeable shorthand for <tag></tag>). However, to maintain compatibility with horrible web browsers (*cough*: IE) that don't understand XHTML and render the page in HTML, you must only self-close tags that would be recognized in HTML as being self closing. From what I have observed, the following can be self closed: <meta />, <img />, <br />, <link />, <input />, and <hr />. However, the following oddballs (that ideally would be able to be self closed) must never be self closing: <td></td>, <th></th>, <script src="..."></script>, <title></title>, <canvas></canvas>, <iframe></iframe>, <textarea></textarea>, and <embed></embed>.
4. (Most) HTML minifiers will completely break a XHTML page. Their output will be valid HTML, but not valid XHTML. For example, most minifiers would turn <br /><script defer="" src="main.js"></script> into <br><script defer src=main.js></script> which is completely invalid in XHTML. So watch out and be careful.
5. In XHTML, there can never be duplicate attributes. For example, the HTML tag <div style="color:red" style="font-weight:bold">AnonyCo</div> will make AnonyCo both red and bold. But, in XHTML, this tag will produce a syntax error. In XHTML, the tag <div style="color:red;font-weight:bold">AnonyCo</div> must be used. This can be quite useful for development because the browser basically points out to you "Hey! This is bad practice" for free.
6. HTML completely ignores custom namespaces, but XHTML does not. Be aware that 'cute' code like <email:example>admin<i>@example.com</i></email:example> will produce different behavior in XHTML than in HTML. In HTML, @example.com will be italicised, whereas in XHTML @example.com will not be italicised. However, if we were to do <email:example xmlns="http://www.w3.org/1999/xhtml">admin<i>@example.com</i></email:example> or <div xmlns:email="http://www.w3.org/1999/xhtml"><email:example>admin<i>@example.com</i></email:example></div> or <email:example>admin<i xmlns="http://www.w3.org/1999/xhtml">@example.com</i></email:example> in XHTML, then @example.com should become italicised according to the specs. However, here lies a hidden quirk: I have never thus far actually seen custom namespaces used in production code (aside from the links at the top of the page), so expect both browser support and browser behavior to be both browser dependent and very likely subject to depreciation. Moreover, I have not yet been able to come up with a single good realistic use for custom namespaces that could not be equated with the Shadow DOM API. Moreoversomemore, browsers that do support custom namespaces likely perform poorly when fed custom namespaces because the custom namespaces were likely an after-thought for spec-conformance, not something mainstream that has been heavily optimized.
7. Tag names and attribute names are case-sensitive in XHTML. In HTML, the code <BIG>Hello World</big> would produce the text Hello World. In XHTML, it would produce a syntax error because BIG and big do not match up exactly. If we were to try <BIG>Hello World</BIG>, then this too would work fine in HTML. This tag would be valid XHTML, however XHTML would render the text at a normal font size as if you typed in <span>Hello World</span> because BIG is different than big, and XHTML only recognizes special CSS properties for big elements. Thus, BIG elements would not have any special styling applied to them in XHTML. Therefore, <BIG>Hello World</BIG> would look the exact same as <span>Hello World</span> in XHTML.
8. Attribute names are also case-sensitive in XHTML. In XHTML, the tag <script SRC="/path/to/the/script.js"></script> would not load any scripts because the SRC attribute is distinct from the src attribute. In order to load the script, you would have to do <script SRC="/path/to/the/script.js"></script>. One may use this attribute sensitivity to do clever tricks such as <div style="color:green" STYLE="display:none">Hooray! Your browser supports XHTML</div>. In XHTML, the STYLE attribute would be unrecognized, thus rendering the text in the color green. In HTML, the STYLE attribute would be recognized as style, thus causing the text to not be displayed. Some may say that tricks like these are bad practice. I say that these tricks are 100% spec-compliant, so those naysayers are flat out wrong.
Javascript behavior differs a little (in some good and bad ways)
1. Let us assume we did the following:
  <button id="myWTFPR_Button">Click Here To Get Your Mind Blown</button>
  <script>myWTFPR_Button.onclick = function(){alert("And bam!")}</script>
  That should fail, right? WRONG! To preserve compatibility with websites from the 90's, at every tag which has an id-attribute, the browser wickedly creates a new window property at window[id] and sets the value of this window property to the element. WHAT THE PERFORMANCE RAMIFICATIONS?!?!?!? Thankfully, this does not happen in XHTML. In XHTML, the code snippet above would ever so wonderfully fail because ID's of parsed tags do not become window properties.
2. Yay! At least one good refusal to downgrade in XHTML. (Surprise, surprise, surprise!) XHTML documents actually allow you to do document.createElement("div") and it will work as you would hope (but not expect) it to work. There is no need to do the crazy B.S. one might think XHTML would demand like document.createElementNS("http://www.w3.org/1999/xhtml", "div").
3. In loose HTML, creating HTML content directly in the document via javascript (e.g. with .innerHTML, .outerHTML, and .insertAdjacentHTML(...)) will never throw an error no matter what. In XHTML, creating HTML from a text string will throw an error if the content to be created is not valid XML/XHTML. If you are uncertain about the HTML being inserted, then make sure to wrap your code dealing with the HTML creation in a try/catch block. Also be extra careful when scripting XHTML with a framework like ReactJS where the error may become obscured/hidden and break the entire page suddenly because ReactJS is likely not accustomed to dealing with exceptions from HTML insertion.
  - Note that clipboard HTML data (e.x. using event.dataTransfer or navigator.clipboard) is not guaranteed to be well-formed, and may have invalid XHTML such as <br>. One solution might be to have a blank HTML iframe whose document you can use to parse the poorly-formatted HTML, then you can use adoptNode to import the rendered HTML contents into your XHTML document.
4. Chrome (at least) ignores the defer="" attribute when it is put onto scripts, instead loading the script synchronously. However, the async="" attribute works just fine.
5. Probably the biggest gotcha of XHTML is the fact that tagName is also case sensitive in XHTML. Let's say that you have the tag <aVeryLongName>Some Text</aVeryLongName>. In HTML, the element.tagName property would be AVERYLONGNAME. But, in XHTML, the .tagName property would be aVeryLongName. In another example, lets say that you have a div tag: <div></div>. The HTML .tagName would be DIV, whereas the XHTML .tagName would be div. Thus, when checking tagName in javascript, you must do, for example, element.tagName.toLowerCase() === 'div' to maintain compatibility with older browsers.
  - (Aside, what the spec should have done (but did not do) is make the default HTML case of .tagName be lower case instead of upper case so that there is no need to check both upper case and lower case)
Page rendering is different in XHTML.
1. If there is an error in the XHTML, then the browser will display a <parsererror> element at the top of the body followed by a rendering of the page up to where the error occurred.
2. XHTML does not recognize <noscript>...</noscript> tags. Instead, XHTML treats <noscript>...</noscript> exactly the same as it would treat a <span>...</span>. Thus, if Javascript is enabled and a page is XHTML, then the contents of <noscript>...</noscript> tags will be displayed to the user despite the fact that javascript is enabled. Although, do not worry too much. From what I have seen, yes: <canvas>Oh poo! Canvas is not supported again</canvas> and <iframe>Sorry, but your browser is a smelly hunk of sadness</iframe> do work properly in XHTML pages, hiding your secret cursing at broken old browsers from the eyes of common people.

XHTML v.s. HTML conclusion

PRO: Honestly, there are very few reasons for why not to switch to XHTML and very many reasons favoring XHTML such as browsers parse XHTML much faster than HTML because there is no back up, revisit previous parts of the HTML, and trying to figure out where this tag ends or what to do with stray greater/less than signs. Also, there is no wildly varying "special" behavior for specific tags which reduces complexity. And, tag ID's do not get assigned as window properties, speeding up the Javascript start up and execution immensely.
CON: XHTML may be much saner than HTML, but (to maintain compatibility with non-XHTML-conformant browsers), one must design their XHTML web page to be backwards compatible with bad old loose HTML. In addition, the ignored behavior of the defer="" attribute on scripts along with the ignored behavior of <noscript>...</noscript> tags combines to make adaptive responsive programming much harder in XHTML.

OTHER TIPS

You're doing premature optimization.

Although, the question you linked claims that:

I know that I can override it by CSS, but that creates lots of overriden specifications, and that seems to highly affect the CPU usage when browsing the page.

[...]

It is a single static page with several hundred hidden <li> items that alternatively become displayed with a click of a button powered by Javascript. I comfirmed that the Javascript part is not using up the CPU much.

[...]

I am using Google Chrome CPU profiler and timeline for the particular page.

and is slightly more elaborate than the usual “I believe that here's the bottleneck, although I haven't profiled anything yet,” it may happen that:

The person asking the original question had a flaw in the measurements,
The person claimed he profiled the thing, but haven't really done anything,
There was a bug in the version of Chrome from 2012,
There was something very specific on the page, aside several hundred hidden list elements.

If it was a real performance analysis, I would expect the question to contain:

The detailed information about the experiment being done. What was measured? How? In which context? On which machines? Which operating system was used? Were the machines doing anything else?
The source code, allowing the test to be performed independently.
The actual measures.
A comparison between between the page where default style was reset with the same page where default style was kept intact.
A conclusion, such as: “Based on the collected data (see above), it appears that default style override caused additional 82 ms. on tested machines, which represents an average of 19% of the load time.”

Since the original question cannot be used as a relevant benchmark, you're only supposing that:

on this website normally microscopic increases in CSS performance will be bloated out to noticeable macroscopic jumps in performance.

and that CSS reset stylesheet would be the bottleneck. Let's see.

Macroscopic jumps in performance

I am creating a project-oriented website where users can create projects with millions of items (DOM nodes)

The fact that today's browsers are very capable of rendering a lot of stuff doesn't mean that you should cram a lot of stuff on a page. From design's perspective, this simply doesn't make sense.

Imagine looking at a million elements at once. A million numbers. A million images. A million words. Whatever. How does it look like? By comparison, a 1920×1080 monitor contains 2073600 pixels. Those are pixels, not data elements. Not words or numbers, or even chart points.

The fact is that if you're showing a few thousands of data elements, in order to be usable, those elements should be presented in specific forms, such as images or charts. This means that each of those elements won't have any DOM node, because you'll be using a canvas. For anything else, the number of elements which would be visible for a user would decrease to a few hundreds.

Of course, there are sometimes more DOM nodes than there are data elements displayed on a page. But having millions of DOM nodes? I don't believe it.

What about the scroll?

Indeed, pages do get scrolled up and down, which means that they sometimes contain more elements than the user has to see at once.

Note that while very long pages (long in terms of vertical scrolling) are usually less responsive, the relation between performance and number of DOM nodes on a page is not exactly linear. Browsers are smart enough to focus on the content which has to be displayed to the user, and do slightly less work for content which is not visible.

Nevertheless, you can get yourself in a situation where there is a huge amount of information to display on a single page. Luckily, there are two examples where this problem was successfully solved: Google Maps and Google's PDF viewer.

With Google Maps, the browser displays the content which is visible, and discards the content which is not. This has a benefit and a drawback. The benefit is that you can run Google Maps on a machine which has less than a few terabytes of memory. The drawback is that when you move from one location to another, the application should occasionally reload the data it discarded a few minutes ago.

With Google's PDF viewer, the approach is very similar. The entire PDF is rendered as a bunch of images and text, all using position="absolute". Every time you scroll the page, the app recomputes what should or should not be shown, and manually adjusts the top and left position of the elements. This makes it possible to effectively handle millions of DOM elements while ensuring outstanding responsiveness.

CSS impact

Still, for any large website with rich content, it would be wise to take a look at CSS impact, and once you've used a CDN, properly minified your static resources and implemented proper client-side caching mechanisms, there would be possibly a few CSS optimizations.

In general, those optimizations focus on:

The number of styles,
The number of elements targeted by a selector,
The complexity of selectors.

And, indeed, you may notice that CSS reset stylesheets are among the ones you may want to optimize, since they contain a bunch of * { ... }.

Once you find that those selectors are an actual bottleneck of your entire application, you work on the selectors to either remove them when possible, or make them more specific. In both cases, it generally consists of a trade-off between performance and consistency and backwards compatibility. In other words, you may decide that it's not that important to have a consistent space between two elements between Chrome and Safari, or that you don't really care that one particular area of the web app is displayed incorrectly in Internet Explorer 9.

This is basically how it works.

You are not approaching the problem correctly. You need to focus on mastering the tools and technologies available to you before attempting to reinvent/circumvent them. If any thing, your XML approach will make things worse, not better; web browsers are made to display html.

Most html browser compatibility problems are due to developers trying to work against html instead of with it. Master the uses of each element type, and do things by the book, and most of your compatibility problems will not be an issue.

Similar things can be said of javascript. Also, use a framework such as jquery, it is specifically made to get around browser compatibility issues.

For the different user agent styles, you should be using a css reset framwork. They are specifically designed to address styling differences between browsers. Additionally, limit yourself to a css framework like bootstrap, use its classes.

Additionally, a previous revision of your question suggests to me, that you have not made a clean separation of display concerns and logic concerns within your mind. I fear that you may be trying make you page pull double duty; both being the expression and display of your client "nodes". This is a very bad idea; and in the long run will cost you significantly more time than it saves you.

If you want to store the nodes as XML, this is fine (though I am certain there are better options), do not try to display them this way as well. Your application's job is to translate storage to a logical entity to a way of displaying it (html). These are three separate concerns and must not be convoluted.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange