Why are the sizes of programs so large?

https://softwareengineering.stackexchange.com/questions/298117

10-10-2020
|

Вопрос

If we look at the vintage program Netscape Navigator or an early version of Microsoft Word, those programs were less than 50 MB in size. Now when I install google chrome it is 200 MB and desktop version of Slack is 300 MB. I read about some rule that programs will take all available memory no matter how much it is but why?

Why are the current sizes of programs so large compared to 10 or 15 years ago? The programs are not doing significantly more functions and do not look very different. What is it that is the resource hog now?

Решение

"Looking very different" is a matter of perception. Today's graphics have to look good at totally different screen resolutions than they used to, with the result that a 100x100 image that used to be more than good enough for a logo would now look horribly tacky. It has had to be replaced with a 1000x1000 image of the same thing, which is a factor of 100 right there. (I know you can use vector graphics instead, but that just emphasizes the point - vector graphics rendering code has had to be added to systems that didn't need it before, so this is just a trade-off from one kind of size increase to another.)

"Working differently" is likewise a matter of perception. Today's browser does massively more things than one from 1995. (Try surfing the internet with a historic laptop one rainy day - you'll find it's almost unusable.) Not many of them are used very much, and uses may be completely unaware of 90% of them, but they're there.

On top of that, of course, is the general tendency to spend less time on optimizing things for space and more on introducing new features. This is a natural side-effect of larger, faster, cheaper computers for everyone. Yes, it would be possible to write programs that are as resource-efficient as they were in 1990, and the result would be stunningly fast and slick. But it wouldn't be cost-effective anymore; your browser would take ten years to complete, by which time the requirements would have completely changed. People used to program with extreme attention to efficiency because yesteryear's slow, small machines forced them to, and everyone else was doing it as well. As soon as this changed, the bottleneck for program success shifted from being able to run at all to running more and more shiny things, and that's where we are now.

Другие советы

If you compare Netscape Navigator to a modern browser, there is a massive difference in functionality. Just compare the HTML 3.2 Spec (51 pages when I do a print preview) with the current HTML Spec (PDF version is 1155 pages). That's a 20x increase in size.

Netscape Navigator did not have a DOM and did not have CSS! There were no dynamic changes of the document, no JavaScript modifying the DOM or style-sheets. No tabs. No audio or video. A modern browser is a vastly more complex program.

One reason is that the data packaged within applications are larger because they are of higher resolution and quality. An icon back in the days of Netscape was at most 32x32 pixels, with at most 8 bit depth, (possibly only 4,) while now it is probably something like 64x64 and it is in true color with transparency, meaning 32 bit depth. That's 16 times larger. And space is so cheap that people often do not even bother checking the "compressed" option when generating a PNG.

Another reason is that applications nowadays carry a mind-boggling amount of data with them, which older applications did not. There exist applications today that get shipped together with a "getting started" presentation in video.

Another reason is that programming languages today tend to go together with rich run-time environments, which are fairly large, to the tune of 100MB each. Even if you do not use all of the features of your run-time environment, you still have to package the whole thing with your app.

But the main reason is that today there exist tons and tons of libraries out there that we can use in our applications, and we have developed a culture of using libraries as to avoid the constant re-invention of the wheel. Of course, once you start using libraries, several questions pop up, and we have developed the habit of giving the most liberal answers to them:

Is it worth to include yet another library if it is going to be used by only one of my functions? --yes.
Is it worth to include yet another library if I only need a tiny subset of the entire wealth of functionality offered by that library? --yes.
Is it worth to include yet another library if its inclusion will only save me from 2 days of work? --yes.
Is it worth to include multiple libraries that serve more or less the same purpose just because different programmers on my payroll happen to already be familiar with different libraries? --yes.

(Please note that I am just observing these tendencies, I am making no statement whatsoever as to whether I agree or disagree with them.)

Another reason worth mentioning is that when trying to decide which application to use among several choices, some users think that the one which occupies more space will be more feature-packed, will have fancier graphics, etc. (Which is complete nonsense, of course.)

So, to conclude, does software behave like gas? Does it tend to occupy all of the space available to it? In a certain sense yes, but not to any alarming extent. If we look at what takes up most space on our drives, for most of us the answer is that it is not applications, but media such as movies and music by far. Software has not been bloating at the same rate that storage capacity has been expanding, and it is unlikely that it ever will, so in the future applications are likely to represent a negligible fraction of the storage space available to users.

In additional to the other ansers, 10 years ago there typically would have been separate versions for localised / internationalised versions. Now it's generally the case that programs will bundle full localisation support into every released version which pads the program size.

One reason is dependencies. A program with rich functionality and good looks needs a lot of things done - encryption, spell checking, working with XML and JSON, text editing and lots of other things. Where would they come from? Maybe you roll your own and keep them as small as possible. Most likely you use third party components (MIT licensed open source perhaps) which have a lot of functionality you never actually need but once you need a single function from a third party component you often have to carry the whole component around. So you add more and more dependencies and as they themselves evolve and grow your program that depends on them grows too.

While the graphics/usability are indeed contributing factors, there's an awful lot of it that's library/excess compiled code.

Example of how small code CAN still be: MenuetOS, a full 64-bit OS with powerful apps that fits on a single floppy disk.

Example of how big code can be for no obvious reason: I did a simple text output "Hello, World!" in Ada recently. The compiled executable was over 1 MiB!. Same executable in assembly is just a KiB or 2 (and the bulk of that is executable overhead, the actual running code is tens of bytes).

It's trivially true that software has to be built to fit two things: The users and the available hardware. A program is fit for its purpose if it does what the user wants in a timely manner with the hardware at the user's disposal. Well duh. But as hardware improves in basically all measurable dimensions the number of discrete programs which move from unfit to fit increases - the design space gets bigger:

Higher level languages make it possible to express ideas in less code & time than before. This lowered complexity, conversely, makes it possible to express increasingly complex ideas.
Bundling more data with the application can make it instantly more usable. For example, it probably won't be long before spell checking applications come bundled with every language known to humanity - it's only a few gigabytes, after all.
Hardware trade-offs allow developers and users more choice in which resource they care about. See for example FLAC/OGG vs WAV, SVG vs PNG, database indexes.
Humane interfaces often trade off what would previously amount to huge amounts of hardware for usability. Anti-aliasing, high resolutions, fast refreshing, and swiping between what amounts to discrete panels all make for a more realistic, and therefore intuitive and relatable, experience.

This is definitely true concerning Android applications. Four years ago, a simple app took about 2-5 megabyte space. Nowadays a simple app takes about 10-20 megabyte space.

The more space available, the bigger the app size.

I think that there are two main reasons in case of Android:

Google expanded the Android framework, added a lot of new functionality.
Developers do not care anymore. Images are included in a far higher resolution (of course the smartphone screen resolutions increased), third-party libraries are thoughtlessly included.

A lot of it boils down to developer time and the cost of that time. Back in the days when Visual Basic first arrived on the scene, it was competing with C/C++ and the big knock against it was that you could write 'Hello World' in ANSI C for Windows in maybe 15K. The problem with VB was you always had the albatross of the 300K runtime library.

Now, you could 10x the size of your VB program and it would still be just a few K more, but 10x the size of your C/C++ program and you're looking at a few MONTHS more development.

In the end the bloat of your applications is a small price to pay for the huge leaps in development production, reduction in price and sheer vastness of capabilities that would have never been possible in the old hand-crafted days of development; when programs were small and fast but also weak, incompatible with each other, under-featured and costly to develop.

By time, user's needs are evolving and more and more demanding, so vendor/authors of different softwares are forced to satisfy those needs in the name of competition.

But satisfying a new need means often adding new code. New code means new vulnerabilities to fix. Fixing new vulnerabilities may add code or open doors to new vulnerabilities.

Each added feature to satisfy a user's need may need more processor power for speed (we all complain about the speed of this or that browser), new graphical resources for better visual effects...etc.

All this means adding new layers of applications (code), security and sometimes hardware.

A lot of the size comes from built in libraries. Many applications these days are built using electron which bundles a huge amount with the application. If you install applications on Linux they are usually much smaller because much of the application is already installed through shared libraries that other programs are also using.

When constructing software, if you need function A, you will import a module A*. A* can solve A, but A* can solve problems more than A, and A* could be large. All the large modules result in the large-sized software.

Maybe not the same case, but something like this: If you just need to print "hello world" on console using Java, you need JRE(>60MB) installed.

If the example of Java is not good, try this one: If the software need to logging to file, it may use a logging module which can actually make logs to database, over network and some other features, but the functions are never used in the project.

I read about some rule that programs will take all available memory no matter how much it is but why?

That's not quite true. Systems will not release memory they have consumed until the operating system comes under memory pressure. This is a performance improvement. If you were browsing a page with images on, you navigate away. You might navigate back, therefore needing the image again. If the operating system has the RAM there is no point in clearing the memory until you are sure you won't need it again.

Clearing the memory immediately would take CPU cycles and memory bandwidth away from the user when they are probably most likely wanting highly responsive web pages to be displayed on screen.

The operating system will consume all available non application memory, the majority of which is for the file system cache.

Memory management is a hard problem but there are very clever people working on it all the time. Nothing is being wasted on purpose and the key goal is to provide you with a very responsive computer.

It may be true that programs tend to expand to fill available space, similar to the suburban phenomena where you add new lanes to a gridlocked superhighway and within a few years traffic is backed up again.

But if you look into it you may find that they programs actually do more stuff. Browsers, for example, run fancier graphics, have slick developer tools that did not exist a few years ago, etc. They also link to many libraries, sometimes using just a small portion of the code. So while programs may increase in size to fill available memory, some of that may be fore legitimate reasons.

Libraries built on objects that are not optimized require more memory to load, install, and more computing cycles to operate. Object code is for the most part bloat.

Just step through standard C++ code running to see all the assert()ed object calls to make sure they are valid objects. When you design layer upon layer of objects encapsulating objects the underlayers are bloated and opaque. Programmers get lazy and take on more objects because it's faster than redesigning what is limited to the needed functionality. It's really that simple.

Consider the size of the Linux C kernel, just the kernel, versus the size of bespoke applications. The kernel can run the entire machine. But it wasn't built as quickly as applications, it takes time slowly to make the best functionality.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с softwareengineering.stackexchange