Why are multiple programming languages used in the development of one product or piece of software?

https://softwareengineering.stackexchange.com/questions/370135

05-02-2021
|

Question

I am a recent grad student aiming to start my Master's in Computer Science. I have come across multiple open source projects that really intrigue me and encourage me to contribute to them (CloudStack, OpenStack, moby, and Kubernetes to name a few). One thing I've found that the majority of them have in common is the use of multiple programming languages (like Java + Python + Go or Python + C++ + Ruby). I have already looked at this other question, which deals with how multiple programming languages are made to communicate with each other: How to have two different programmings with two different languages interact?

I want to understand the requirement that prompts enterprises to use multiple programming languages. What requirement or type of requirement makes the software architect or project lead say, "I'm proposing we use language X for task 1 and language Y for task 2"? I can't seem to understand the reason why multiple programming languages are used in the same product or software.

Solution

This answer has superb coverage and links on why different languages can provide distinct benefits to a project. However, there is quite a bit more than just language suitability involved in why projects end up using multiple languages.

Projects end up using multiple languages for six main reasons:

Cost benefits of reusing code written in other languages;
The need to include and accommodate legacy code;
Availability of coders for specific languages;
The need for special languages for specialty needs;
Legacy language biases; and
Poor project management (unplanned multi-language use).

Reasons 1-4 are positive reasons in the sense that addressing them directly can help a project conclude faster, more efficiently, with a higher-quality product, and with easier long-term support. Reasons 5 and 6 are negative, symptoms of resistance to needed change, poor planning, ineffective management, or some combination of all of these factors. These negative factors unfortunately are common causes of "accidental" multi-language use.

Reason 1, the cost benefits of reuse, has become an increasingly powerful reason to allow the use of multiple languages in a project due both to the greater role of open source software and improved capabilities to find the right code components on the web. The "let's code it all internally" philosophy of past decades continues to fade in the face of economic realities, and is essentially never the most cost-effective approach for any new projects. This in turn makes opportunities for strict enforcement of the use of a single language within a project less common.

Especially in the case of a project reusing well-managed open source components, the use of multiple languages can provide huge overall cost benefits because the reused components are both hidden behind well-designed interfaces, and are independently maintained by zero-cost external groups. In best-case scenarios, mixing languages via this kind of reuse is no more costly to the project than using operating system components. I know of no better example of the value of this approach than Microsoft's large-scale adoption of open source software in their browsers.

Reason 2, the need to accommodate legacy code, is ignored at the peril of any large project. However much trouble legacy code may cause, naively assuming that it can be replaced easily with new code in a new language can be incredibly risky. Legacy code, even bad legacy code, often includes what amounts to an implicit "contract" of features expected by the community that uses the legacy product. That community quite often is a major source of revenue for a company, or the main target of support for government software. Simply discarding that implied contract can chase away customers in droves, and can bankrupt a company overnight if other options are readily available.

At the same time, not replacing old code in an old language can be just as dangerous as replacing it wholesale. A worst-case example is the U.S. Veterans Administration, which has a large number of vital systems coded in a language called MUMPS (no kidding) that was designed by medical doctors, not computer scientists. No one wants to learn MUMPS, and those who do know it are literally dying off. Programmers must therefore accommodate MUMPS even as they try to move forward to using other more common, more powerful, and better-maintained languages.

This type of multi-language use requires careful planning. That planning must navigate the knife edge between losing decades of customer knowledge on one hand, and losing the ability to support the software on the other. Techniques that isolate the old code behind well-defined interfaces, and which enable new more powerful code to replace the old code after its behaviors have been well documented, can help. But this legacy scenario is never easy, and has been (and will continue to be) the cause of the demise of many companies and organizations across a broad spectrum of sizes.

Reason 3, availability of coders for various languages, is a pragmatic factor that projects ignore at their peril. However much the project organizers may feel (correctly or incorrectly) that a particular language is best for their goals, if that language is in conflict with the language expertise pool available to them, both the schedule and quality of the product will suffer from the learning curved of programmers trying to learn a new language.

A more rational approach is to analyze the language needs of the project based on functional areas. For example, looking carefully at the project may show that there is only a small "apex" of high-value code, e.g. for implementing some proprietary algorithm, that requires coding expertise in a some less commonly used language. Other parts of any large project are often easily accommodated by more common languages, or (even better) by well-managed open source products. Analyzing a project by language needs thus can provide a much more realistic and cost-effective approach to hiring or renting special expertise in special languages, and can also help sharpen the interfaces between languages within a single project.

Reason 4, using different languages for different needs, follows immediately and smoothly from performing that kind of analysis of project needs. Care should be used in this also, since selecting too many languages for support within a single project can cause a combinatorial explosion of complexity both in support and interfaces between components. The safest route cost-wise is always to find the maximum opportunities for reuse first, especially if there exist good packages that can meet project needs through little more than customization. Next, some kind of decision should be made on some small number of languages that can address the majority of identified needs. In reuse-intensive development, this will often be a type of glue code.

It is generally not a good idea to choose multiple languages with very similar capabilities just because some members of the project like one and some the other. However, if there are well-identified, well-defined capability subset that would benefit from special language skills, that can be a good reason for using multiple languages for new code development.

Reason 5, resistance to needed changes in the languages used, can be a cause of severe project disruption and internal strife. As user Daveo pointed out in a comment on this answer, change can be very difficult for some project personnel. At the same time, resistance to change is never a simple issue, which is precisely why it can cause much strife. Use of legacy language skills can be a powerful boost to the productivity of a project if the legacy language is sufficiently powerful, and can lead to a product with excellent quality in a team that operates smoothly and respects quality. However, legacy language skills must be balanced with the fact that many older languages can no longer compete with more recent languages in terms of advanced features, component availability, open source options, and intelligent tool kit support.

Both then and now, the single most common (and ironically, most often correct) argument for continuing to use a weaker, less readable, or less productive legacy language has been that the older language enables production of more efficient code. This is an old argument, one that goes all the way back to the 1950s when users of assembly language resented, often bitterly, the emergence of programming in FORTRAN and LISP. An example where even now the code efficiency argument can have validity can be seen in the processing-intensive code such as an operating systems kernel, where C remains the language of choice over C++ (though for reasons that go beyond simple efficiency).

However, in the globally networked and powerfully machine-supported project environments of the new millennium, code efficiency as the main argument for choosing a project language has grown even weaker. The same explosion of computing and networking hardware that has enabled mass marketing of artificial intelligence applications also means that the costs of human programming can easily dwarf those of relativity inefficient code execution on spectacularly cheap hardware and cloudware. When that is combined with the greater availability for in more recent languages of component libraries, open source options, and advanced intelligent tool kits, the number of cases where keeping a language for efficiency reasons alone becomes very narrow. Even in cases where it does apply, the focus should be on using languages such as C that continue to have broad community support.

A more compelling reason for a project to stay with legacy languages occurs when for whatever reasons a project has few or no options for changing its staff. This can happen for example when a major legacy product line is coded entirely in a language with which only the existing staff is fluent. In such cases the project must either continue down the path of trying to program in the old language, or attempt to train existing staff in how to use a new language.

Training legacy language staff in a new language can be a danger all by itself. I still recall a case where a member of a project that had just been trained and transitioned from C to C++ complained to me in all sincerity that he just did not understand the advantages of object-oriented methods. When I looked at his code, he had converted his earlier 103 C functions into 103 methods for a single C++ object class... and rightfully did not see how that helped anything.

The deeper message is that when people have programmed in a single language and language style for years or decades, the difficulty in getting them to "think" in new ways can become almost insurmountable, even with good training programs. In some cases may be no other option but to bring in younger designers and programmers who are more in tune with current trends and methods.

Reason 6, poor project management, speaks for itself. Language selection and use in a project should always be considered and assessed explicitly, and not allowed to happen just by accident. At the very least, language selection can make a huge difference in the long-term fate and support costs of a project, and so should always be taken into account and planned out. Don't become a MUMPS!

OTHER TIPS

I can't seem to understand the reason as to why multiple programming languages are used in the same product or software?

It is quite simple: there is no single programming language suitable for all needs and goals.

Read Michael L. Scott's book Programming Language Pragmatics

Some programming languages favor expressiveness and declarativity (a lot of scripting languages, but also high-level programming languages like Agda, Prolog, Lisp, Haskell, Ocaml, ...). When the cost of development is important (human time and cost of developers), it is suitable to use them (even if the runtime performance is not optimal).

Other programming languages favor run-time performance (many low-level languages, with usually compiled implementations, like C++, Rust, Go, C, assembler, also specialized languages like OpenCL ...); often their specification allows some undefined behavior. When the performance of the code matters, it is preferable to use these languages.

Some external libraries are written in and for a particular language and ABI and calling conventions in mind. You may need to use that other language, and follow foreign function interface conventions, perhaps by writing some glue code.

In practice, it is unlikely to have a programming language which is highly expressive (so improves the productivity of the developer, assuming a skilled enough developer team) and very performant at runtime. In practice, there is a trade-off between expressivity and performance.

Note: however, there has been some slow progress in programming languages: Rust is more expressive than C or perhaps even C++ but its implementation is almost as performant, and probably will improve to generate equally fast executables. So you need to learn new programming languages during your professional life; however there is No Silver Bullet

Notice that the cost of development is more and more significant today (that was not the case in the 1970s -at that time computers where very costly- or in some embedded applications -with large volume of product). The rule of thumb (very approximate) is that a skilled developer is able to write about 25 thousand lines of (debugged & documented) source code each year, and that does not depend much on the programming language used.

A common approach is to embed some scripting language (or some domain specific language) in a large application. This design idea (related to domain-specific language) has been used for decades (a good example is the Emacs source code editor, using Elisp for scripting since the 1980s). Then you'll use an easily embeddable interpreter (like Guile, Lua, Python, ...) inside a larger application. The decision to embed an interpreter inside a large application has to be done very early, and has strong architectural implications. You'll then use two languages: for low level stuff which has to run quickly, some low level language like C or C++; for high level scripts, the other DSL or scripting language.

Notice also that a given software can run, within most current operating systems (including Linux, Windows, Android, MacOSX, Hurd, ...), in several cooperating processes using some kind of inter-process communication techniques. It can even run on several computers (or many of them), using distributed computing techniques (e.g. cloud computing, HPC, client server, web applications, etc...). In both cases, it is easy to use several programming languages (e.g. code each program running on one process or computer in its own programming language). Read Operating Systems: Three Easy Pieces for more. Also, foreign function interfaces (e.g. JNI), ABIs, calling conventions, etc... facilitate mixing several languages in the same program (or executable) - and you'll find code generators like SWIG to help.

In some cases, you have to mix several programming languages: web applications need Javascript or Webassembly (the only languages running inside most web browsers) for the part running in the browser (there are frameworks generating these, e.g. ocsigen). Kernel code need some stuff (e.g. the scheduler, or the low level handling of interrupts) to be partly written in assembler, because C or C++ cannot express what is needed there, RDBMS queries should use SQL, GPGPUs need computer kernels coded in OpenCL or CUDA managed by C or C++ host code, etc.... Some languages are designed to facilitate such a mixture (e.g. asm statements in C, code chunks in my late GCC MELT, etc...).

In some cases, you use metaprogramming techniques: some parts of your large software project would have code (e.g. in C or C++) generated by other tools (perhaps project specific tools) from some ad-hoc formalization: parser generators (improperly called compiler-compilers) like bison or ANTLR come to mind, but also SWIG or RPCGEN. And notice that GCC has more than a dozen of specialized C++ code generators (one for every internal DSL inside GCC) inside it. See also this example. Notice that metabugs are hard to find. Read also about bootstrapping compilers, and about homoiconicity and reflection (it is worthwhile to learn Lisp, play with SBCL, and to read SICP; look also into JIT-compiling libraries like GCCJIT; in some large programs you might generate some code at runtime using them; be aware of Greenspun's tenth rule). Look also into the Circuit Less Traveled talk at FOSDEM2018.

Sometimes, you want to provide formal annotations of your code (e.g. to help provers, static analyzers, compilers), using some specialized annotation language (which might be viewed as some DSL). Look into ACSL with Frama-C to annotate C programs (safety-critical ones), or OpenMP pragmas for HPC. Caveat: writing such annotations can require a lot of skills and development time.

BTW, this suggests that some skills about compilers and interpreters are useful for every developer (even without working inside compilers). So read the Dragon Book even if you don't work on compilers. If you code your own interpreter (or if you design your DSL), read also Lisp In Small Pieces.

See also this & that & that & that answers of mine related to your question.

Study also the source code of several large free software projects (on github or from your Linux distribution) for inspiration and enlightenment.

Also, some programming languages evolved by adding annotations (as pragmas or comments) to existing languages. For examples, think of ACSL (a comment-extension to annotate C programs to enable their proofs by Frama-C) or of OpenCL (a C dialect to program GPGPUs) or OpenMP or OpenACC #pragmas or Common Lisp type annotations.

PS: there are also social or organizational or historical reasons to mix programming languages; I'm ignoring them here, but I know that in practice such reasons are dominant. Read also The Mythical Man Month

Many projects are not built with multiple programming languages. However, it is common to use scripts in other languages to assist with the software.

Administration tools that are separate programs are sometimes written in a different language.
Libraries and APIs frequently offer bindings for multiple languages, so that developers can use whatever language they prefer.
Build scripts and related development scripts often use specialized languages.
End to end tests of an application do not need to use the same language.

A few projects do use multiple languages within the application, e.g. a core in a lower-level language that can integrate plugins in a scripting language. In some ecosystems (e.g. JVM or .NET) the exact language used is not too important since multiple languages on the same language runtime have good interoperability. For example, I could write a project in Scala that uses existing Java libraries, and integrates scripting functionality with Groovy.

If a project consists of multiple tools, those can also be developed in different languages. While consistency would be good, especially for open source projects the available development effort can be a bottleneck. If someone is willing to develop a useful tool but isn't familiar with the main language, maybe that tool is more valuable than consistency.

This has two forms, and a lot of organisations that fall somewhere between the two:

The bad form - the organisation is a mess, and there's nobody making sure that there's a single technological vision for the effort. Devs most likely use whatever language they're most comfortable in, or recently experimented with a new framework or language and decided to simply begin using that due to naive optimism.

The good form - the organisation has really good, clean architecture which lends itself well to polyglot programming; decoupling applications into independent components with a well-defined bounding context, and this combination allows them to select the programming language which most simply allows them to write that particular component.

Reality - it's normally more the former than the latter. I've seen a few companies choose one language for their business domain and another for their web server, and often third for their database administration, which is technically fine but pretty soon their lack of technical understanding (or refusal to listen to their staff) means that they end up with all three blurring together in a big mess, and often introducing yet more languages that are appropriate to solving particular parts of the mess.

I can contribute an example, of a programming project which has been running for 32 years, and appears to still have plenty of life left in it. It's commercial rather than open-source.

The core is written in a domain-specific language, created specifically for the project. This has proved extremely useful, notably because it integrates rollback into the basic architecture, but it compiles into C code, which we then compile with the platform's compiler. It has supported about twenty platforms over that time, not counting 32- vs 64-bit variations, and currently ships on six of those.

It has an add-on, written in C++, which was started when a past head of the project became convinced that C++/MFC/Windows/x86 was going to displace all other architectures and platforms, so it was necessary to offer C++ work to be able to hire staff. Things did not turn out as he expected.

In addition to the domain language and C++, developers work in LISP, which is used to write test cases, using an interpreter embedded in the test harness. We considered replacing LISP with Java at one point, but it turned out to be fortunate that we did not.

It also has a wrapper for its main API, written in C#. This was done when customers demanded it, so that they could re-write their applications in C#. It is created by a Perl script, which reads the C header files for the API, plus a significant configuration file, and writes the C# code for the wrapper. Doing all that text-processing in Perl was just easier than doing it in C++.

It has build systems of its own, and needs them, because the domain language is not amenable to make-based build systems. The one for UNIX-like platforms is written in shell scripts, Perl and some small programs in the domain language. The one for Windows platforms is written in batch language, Perl, and the same small programs in the domain language. The old VMS build system was written in DCL, but that has not been used for over a decade.

There's some YACC/Bison programming in the compiler for the domain language. There's some testing code for Apple platforms written in Objective-C++. Some of the team's internal websites (used for project management, not part of the deliverables) are written in ASP, and others as CGI-scripts in Perl.

Basically, this started as a project to address a hard problem, so it was worth creating specialised tools, which still seem more suitable for this job than anything else available. The team considers programming to be a skill that's somewhat independent of the language used, so they're willing to use a new language if it will make a task easier. However, fashion does not come high on their list of priorities, so they won't fragment a task by introducing a new language gratuitously.

The function of this code is mathematical modelling, used on workstations and servers (I can speak a bit more freely if I don't identify the product). It's currently about 25 million LoC, with a total team size of about fifty.

In some cases, there is a tool you need to use (such as an OS's UI toolkit) which is most easily accessible from a given language. For example on iOS and macOS, if you want to write GUI applications using UIKit and AppKit, writing in Swift or Objective-C is the fastest, easiest way to do it. (There may be bindings for other languages, but they may behind the latest OS release you're building against, so may not offer all features.)

So often what happens is when an application is cross-platform, the core logic of the app is written in some language that is accessible to both, and the UI/OS-specific pieces are written in whatever language they work in natively.

So if you're writing a Windows and macOS application, you could write the core logic in C++ and use C# for the UI on Windows and Swift for the UI on macOS. This saves time because you don't need to write the core logic twice and deal with different sets of bugs in both apps, etc. But it also allows a true native UI that doesn't cater to the lowest common denominator between platforms like using a cross-platform UI library would.

In addition to the fact that some programming languages can be better suited for some specific tasks, there is the practical reality.

In the practical reality, there are 2 especially important points to be considered:

People have different experience and levels of interest in different programming languages. - Allowing people to work in languages they like and are proficient in can under some circumstances lead to a better end result than forcing them to a common language.
Large codebases are built over long periods of time by different people. - There is no way to get the funding or the amount of volunteers needed to rewrite an entire project once a language that's better suited for it comes out.

And of course there are often specific parts of an application that have entirely different needs, such as:

Performance sensitive areas developed in a compiled language. Example language: C++
Areas that need to be cheap, easy to change, and potentially customizable, developed in a scripting language. Example language: Lua.
GUI Layout. Example language: HTML
Installer. Example language/tool: WiX.
Build. Example language/tool: Too many to list, usually several of them at once.

And on top of that there are quite a few tools used by a sophisticated codebase, many of which allow necessary customization and plugins with yet another language.

In addition to the correct other points already made:
From experience, many language or environment decisions are made by 'if you have a hammer, everything looks like a nail', meaning, people tend to use the tools they are familiar with.

In addition, introducing a new environment or even language is a major invest in licenses, trainings, maybe hardware, and comes with large losses of productive time - procure, install, configure, train each take weeks in a larger company, and you basically end up with a bunch of beginner developers.
There is basically never the time to 'invest' in a more modern or better fitting environment or language, so they stick with what they have until it just can't made to work anymore.

Specifically to your question, if multiple people/teams are participating in the development of a solution, each group tends to stick with what they know best, so the overall solution has potentially multiple languages in it, and is development in multiple environments.

This question (and some of the answers) seem to assume applications are monolithic blocks of code - this is not necessarily the case.

Your typical web site like Stack Exchange is actually a bunch of different services all running independently of each other, with some sort of messaging between them. These services can be (and typically are) implemented in different languages, each with its own strengths.

I work on a tiny sliver of an online banking platform, targeted to smaller community banks and credit unions. This platform has multiple components - a Web front end, a database layer, a third-party communications layer, etc. These are all independent applications running on different servers with different operating systems. You have Javascript running on the client side in the browser, you have Perl building pages on the server side, you have multiple services written in C++ acting as an abstraction layer between the Perl code and the bank's core processor, another set of C++ applications that route messages between the various layers, a smattering of C++ applications and Perl scripts to monitor the health of various processes and report their status to an internal monitor, etc., etc., etc. The third party systems my layer interacts with are typically COBOL applications running on mainframes somewhere.

Monolithic applications do still exist, but even they can take advantage of different languages for different reasons. You can write 90% of an application in Java, but use JNI to leverage C or C++ for more performance-critical sections.

I'd like to point out a very specific instance of the 'different languages have different strengths' motif: FORTRAN.

Fortran was originally developed to make it easier for engineers to do numerical analysis work, and a lot of effort has since gone into making Fortran compilers emit very efficient numerical code. On the other hand, since those early days the use of computers has exploded in a thousand directions, none of which involve numerical analysis, and the development of programming languages has largely followed suit in ignoring the 'real' world [pardon the pun].

SO: It is today, and you find yourself working for a company with a fairly sprawling product, most of it written in (say) Java (I speak from personal experience here). But you find that one of the core features of the product is going to reauire some form of numerical analysis, and all the best codes for that particular analysis are already available on the net - in Fortran. So what do you do? You download one of those Fortran codes, figure out its interface [i.e the arguments of the topmost subroutine], whip up a JNI wrapper for it in C, and package it as a Java class. BAM! You've just had to develop in three languages at once. [especially if you find that your Fortran code uses COMMON blocks - i.e static storage - and has to be modified for thread-safety]

Because programming is not one task. Even creating a product is not one task. There are multiple types of tasks which are best expressed with different languages.

To make it more concrete, let's assume something simple like a stand-alone app (there are more tasks to be performed for distributed apps).

A product needs to be

written
put together (this involves both compilation and gathering of resources such as images, fonts, etc. for deployment)
deployed
configured
monitored

A language that may be good for writing a product's run-time is very unlikely to be just as good for putting a product together. And so on.

However even the process of writing a product may not be optimally done in 1 language.

Let's say there is a lot of structured data that is handled in the product. Is the structure of the data known at the time of the writing? If it is, you'll want to configure some database at the time of deployment. This is optimally done in a language which can generate the language that will compile into your run time.

Now what if the structure of the data can change from time to time? Than you need a structured way of turning new data constructs into code and database configuration. This is best done in yet another language.

Can you do it all in the same language? Sure. But your effectiveness is determined by how quickly you can finish a project and how resilient to changes it is. If a lot of your work can be automated with already-existing tools, then what you take 3 months to do can be done by someone else in 2 days. But that someone would be using other languages to automate what you would do through repetition.

Software development has progressed to the point where you can use different programming languages in the same project, and you can make it work.

Then there's the question why you would use more than one language.

One reason is that languages get outdated and slowly replaced by newer ones, and if you are lucky that can be done bit by bit. So your project will have a mix of old and new language.

Another reason is the situation where language X is very much what is used on platform A, language Y is very much what is used on platform B, but language Z is supported on both platforms. So common code is written in language Z, which is then combined with X or Y, depending on the platform.

And people like to use code written by others (in other words, code that they didn't have to spend time on to write it themselves). They feel free to use code written by others in any language, and then add code in the language they prefer.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange