The purpose of MHTML is to combine all elements of a web page in one file. It was originally proposed in 1999. More info here. This might seem like a good idea, but it has not been adopted by many browsers for reasons I can only speculate, but have a good gut feeling on.
Basically it comes down to what is HTML and linking to external components versus having them all in one file. It might seem more convent to have it all in one file, but imagine that file is 1MB in size. That would mean the content would not render until the whole MHTML file is downloaded. Which is very bad. Who wants to wait for content to load? In contrast HTML (and XHTMNL) work better since they convey the basic structure of a page right away when loaded. And it loads fast since it is just text. And then additional elements can be loaded after the main HTML is loaded.
So let’s say you want to read a news article in your browser. You request the web page. The bulk of the content is loaded by the HTML file. You can begin reading while the other elements load. And thus, you have a better user experience even on a slow connection.
Why does Internet Explorer implement it? Who knows. Microsoft always wants to do things differently to dominate a market. So perhaps there is some obscure way that Internet Explorer uses MHTML I am unclear on. But Microsoft generally does not care about web standards but rather just forcing their will on the world because they basically “own” the business/enterprise market.