Is there a correlation between the scale of the project and the strictness of the language?

https://softwareengineering.stackexchange.com/questions/209376

29-09-2020
|

Question

Explaining the difference between strictness of languages and paradigms to a colleague of mine, I ended up asserting that:

Tolerant languages, such as dynamic and interpreted languages, are used best for prototypes and small projects or medium-size web applications. When choosing elegant dynamic languages such as Python or JavaScript with Node.js, the benefits are:
1. Fast development,
2. Reduced boilerplate code,
3. Ability to attract young, creative programmers who flee “corporate languages” like Java.
Statically typed/compiled languages are best for applications which require higher strictness such as business-critical apps or apps for medium to large-size apps.
1. Well-known paradigms and patterns developed for decades,
2. Ease of static checking,
3. Ability to find many professional developers with decades of experience.
Strict languages such as Haskell, Ada or techniques such as Code contracts in C# are better for systems which favor safety over flexibility (even if Haskell can be extremely flexible), such as life critical systems and systems which are expected to be extremely stable. The benefits are:
1. Ability to catch as many bugs as possible at compile time,
2. Ease of static checking,
3. Ease of formal proofs.

However, by looking at the languages and technologies used for large-scale projects by large corporations, it seems that my assertion is wrong. For example, Python is successfully used for large systems such as YouTube or other Google applications which require an important amount of strictness.

Is there still a correlation between the scale of the project and the strictness of the language/paradigm which should be used?

Is there a third factor that I've forgotten to take in account?

Where am I wrong?

Solution

An interesting case study on the matters of scaling projects that use dynamic and interpreted language can be found in Beginning Scala by David Pollak.

I started searching for a way to express the code in my brain in a simpler, more direct way. I found Ruby and Rails. I felt liberated. Ruby allowed me to express concepts in far fewer lines of code. Rails was so much easier to use than Spring MVC, Hibernate, and the other “streamlined” Java web frameworks. With Ruby and Rails, I got to express a lot more of what was in my head in a shorter period of time. It was similar to the liberation I felt when I moved from C++ to Java...

As my Ruby and Rails projects grew beyond a few thousand lines of code and as I added team members to my projects, the challenges of dynamic languages became apparent.

We were spending more than half our coding time writing tests, and much of the productivity gains we saw were lost in test writing. Most of the tests would have been unnecessary in Java because most of them were geared toward making sure that we’d updated the callers when we refactored code by changing method names or parameter counts. Also, I found that working on teams where there were mind melds between two to four team members, things went well in Ruby, but as we tried to bring new members onto the team, the mental connections were hard to transmit to new team members.

I went looking for a new language and development environment. I was looking for a language that was as expressive as Ruby but as safe and high-performance as Java...

As you can see, major challenges in project scaling for author turned out to be in test development and knowledge transfer.

In particular, author goes into more details in explaining the differences in test writing between dynamically and statically typed languages in Chapter 7. In section "Poignantly Killing Bunnies: Dwemthy’s Stairs" author discusses Scala port of a particular Ruby example:

Why the Lucky Stiff... introduces some of Ruby’s metaprogramming concepts in Dwemthy’s Array in which a rabbit battles an array of creatures. N8han14 updated the example to work in Scala...

Compared to the Ruby code, the library parts of the Scala code were more complex. We had to do a lot of work to make sure our types were correct. We had to manually rewrite Creature’s properties in the DupMonster and the CreatureCons classes. This is more work than method_missing. We also had to do a fair amount of work to support immutability in our Creatures and Weapons.

On the other hand, the result was much more powerful than the Ruby version. If we had to write tests for our Ruby code to test what the Scala compiler assures us of, we’d need a lot more lines of code. For example, we can be sure that our Rabbit could not wield an Axe. To get this assurance in Ruby, we’d have to write a test that makes sure that invoking |^ on a Rabbit fails. Our Scala version ensures that only the Weapons defined for a given Creature can be used by that Creature, something that would require a lot of runtime reflection in Ruby...

Reading above can make one think that as projects grow even larger, test writing might become prohibitively cumbersome. This reasoning would be wrong, as evidenced by examples of successful very large projects mentioned in this very question ("Python is successfully used for... YouTube").

Thing is, scaling of the projects isn't really straightforward. Very large, long-living projects can "afford" different test development process, with production quality test suites, professional test dev teams and other heavyweight stuff.

Youtube test suites or Java Compatibility Kit sure live a different life than tests in a small tutorial project like Dwemthy’s Array.

OTHER TIPS

Your assertion is not wrong. You just need to dig little deeper.

Simply said, big systems use multiple languages, not just one language. There might be parts that are built using "strict" languages, and there may be parts that are built using dynamic languages.

As for your Google and YouTube example, I heard that they use Python primarily as "glue" between various systems. Only Google knows what those systems are built with, but I bet that many of Google's critical systems are built using strict and "corporate" languages like C++ or Java, or maybe something they themselves created like Go.

It is not that you can't use tolerant languages for large-scale systems. Many people say Facebook uses PHP, but they forget to mention that Facebook had to create extremely strict programming guidelines to use it efficiently on this scale.

So yes, some level of strictness is required for large-scale projects. This can come either from strictness of language or framework, or from programming guidelines and code conventions. You can't just grab few college graduates, give them Python/Ruby/JavaScript and expect them to write software that scales across millions of users.

My experience with large systems is that they stand or fall not by language choice, but by issues of design/architecture or test coverage. I'd rather have a talented Python team on my big enterprise project, than a mediocre Java one.

Having said that, any language that let's you write significantly less code, has to be worth looking at (e.g. Python vs Java). Perhaps the future is in clever, statically-typed languages with advanced type-inference (e.g. in the Scala mold). Or hybrid, such as C# is attempting with its dynamic qualifier...?

And let's not forget the "other" static typing benefit: proper IDE code-completion/intellisense, which in my view is an essential feature, not a nice-to-have.

There are two kind of errors to check for: type errors (concatenate an integer + list of floats) and business logic errors (transfer money to a bank account, check if the source account have money).

The "dynamic" part of a dynamic programming language is just the place where type checking takes place. In a "dynamically typed" programming language, type checking is done while executing each statement, while in a "statically typed language" type checking is done at compilation time. And you can write an interpreter for a static programming language (like emscriptem does), and you can also write an static compiler for a dynamic programming language (like gcc-python or shed-skin does).

In a dynamic programming language like Python and Javascript you need to write unit tests not only for the program business logic but also to check if your program does not have any syntax or type errors. For example, if you add "+" an integer to a list of floats (which does not makes sense and will issue an error), in a dynamic language the error will be raised at runtime while trying to execute the statement. In a static programming language like C++, Haskell and Java, this kind of type error will be caught by the compiler.

A small codebase in a dynamically checked programming language is easier to look for type errors because is easier to have a 100% coverage of source code. That's it, you execute the code by hand a few times with different values and you're done. Having a 100% coverage of source code gives you a fair hint that your program may not have type errors.

With a large codebase in a dynamically checked programming language it is harder to test every statement with every possible type combination, specially if you are careless and write a function that may return a string, list or custom object depending on its arguments.

In a statically checked programming language the compiler will catch most of the type errors at compile time. I say most because a division by zero error, or an array out of bounds error are also type errors.

More often than not the real discussion is not about programming languages but about the people using those languages. And this is true because, for example, assembly language is as powerful as any other programming language, yet we're writing code on JavaScript. Why? Because we're humans. First, we all make mistakes and its easier and less error prone to use a dedicated tool specialized for a specific task. Second, there is a resources constraint. Our time is limited, and writing webpages on assembly would take ages to finish.

Another consideration is the who behind writing large-scale applications. I have worked at plenty of places who want to use Ruby or Python on some big enterprise-style projects, but are consistently "shot down" by IT managers and corporate security teams precisely because of the open source nature of the projects.

I have been told, "We can't use Ruby on Rails because it is open source and someone could put hacks in there that steal critical or protected information." I'm sorry, but once someone has that mindset that open source == evil, it is nearly impossible to change it. That line of thinking is a corporate disease.

C# and Java are trusted languages with trusted platforms. Ruby and Python are not trusted languages.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange