looking for scientific evidence of the benefits of using a DSL

https://stackoverflow.com/questions/1625608

06-07-2019
|

Question

Greg Wilson's talk "bits of evidence" ( http://www.slideshare.net/gvwilson/bits-of-evidence-2338367 ) discusses the lack of evidence behind the following claims that Martin Fowler has advanced as benefits of using a DSL:

"[using a domain-sepcific language] lead to two primary benefits. The first, and simplest is improved programmer productivity. The second ...is... communication with domain experts." -- Martin Fowler in IEEE Software July/August 2009

Question: Are there any empirical studies providing evidence of either improved programmer productivity or improved communication with domain experts from using a DSL?

Lots of people building DSLs are unable to provide a reasoned answer to "why are you building a DSL?" and "why would a DSL help you more than a well-factored object model?"

I hear a lot of "I'm doing it because it's cool and everybody else is doing it" - which is not a rational answer.

I believe that DSLs are helpful at least some of the time but that they're not likely to be a "silver bullet" that should be used indiscriminately. I would like to see some scientific work that describes when DSLs should and should not be used - based on empirical research.

Solution

Just Googling about, I found a couple interesting papers:

Domain-Specific Languages in Practice: A User Study on the Success Factors

Domain-Specific Languages versus Object-Oriented Frameworks: A Financial Engineering Case Study

Post-design Domain-Specific Language Embedding: A Case Study in the Software Engineering Domain

And I imagine you could find some better references starting from Google Scholar...

OTHER TIPS

It depends on what you consider a DSL to be.

For example, is css a DSL? I would think so, then it obviously can make it easier to style a page, as, in HTML 3 we used tables for arrangements, and didn't have the flexibility we do now.

If you have a DSL so students can design molecules using just the atomic symbols (H20), then it would be simpler than doing the coding yourself, as you can quickly look at the molecular configuration if you give the symbols and types of bonding, for example.

I don't know of a paper showing one way or another, but, if your target audience is not programmers, then a DSL makes sense, so we can have accountants writing their application, using their terminology, rather than having them give requirements to developers.

DSLs have been around for a long time, but are now becoming more popular, so time will tell when there are more examples of good and bad uses as to when it is best to use it and when it actually is detrimental. I wouldn't write medical monitoring software with any DSL, for example.

The whole premise of "scientific" in this case is dubious. There is simply no way to guarantee the criterions of "reproducible", "control (group)" required for an empirical study.

By and large in business programming there are no serious empirical studies on the benefits of something prior to its being used. Whether that is SQL, object oriented languages, functional languages, garbage collection, etc.

These things tend to be decided by the market over time.

Why this is the case is probably a combination of two reasons. One is that it is very expensive to get a good empirical study and it is much cheaper from an economical point of view to just try it. The other is that each situation is different, so an empirical study would have to start with limiting the problem under study very narrowly to have a proper comparison between using a DSL and not using one, and the end result of the study would not be very useful beyond the specific type of problem that was chosen.

I think we can safely say from experience that nothing is a silver bullet, and insisting on a good reason for an approach will make any solution better, because even if a DSL would help a situation, if you don't know why you are doing it, you won't know if you are doing it right and may end up missing the whole benifit.

This is a sensible question, and I think there are definitional problems, such as "what is a DSL"? When a buzword becomes "hot" it becomes a marketing opportunity and gets divorced from underlying science, if there is any.

Some years ago, I wrote a book (Building Better Applications, ISBN 0-442-01740-5, long out of print) where I tried to look into performance, not only of programs, but of programmers. I tried to look at it using information theory.

I came up with a crude measure of maintainability, where a problem exists as a knowledge structure in somebody's head (no problem for an AI guy to say so), and its solution exists as a textual structure processed by a machine. What I look at is the relationship between these two structures. For example, if a change occurs in the mental problem description, how many source code changes are required to transfer that to the program text correctly? A simple way to measure that is to diff the code between before and after. Now, average that measure over the space of changes that are likely, and the lower the average is, the more maintainable is the source code.

My thesis was that the more maintainable code is, by that measure, the more it comes to resemble the mental model of the domain, so it is reasonable to call it more "problem-oriented" or more "domain-specific". One characteristic I noticed of such code is that it tends to be more a statement of the problem, rather than a solution of the problem. The solution lies not in the language, but in the implementation of the language, the sub-structure. This is an echo, though not a direct agreement, with the concept of "declarative" vs. "imperative" language.

So in trying to answer your question, I would say let's get away from what people might want "DSL" to mean and instead look at a definition that's at least moderately unambiguous.

As part of developing that idea, I had stumbled on a number of techniques, one of which is Differential Execution, which seems to give good maintainability for coding UIs, and also reduces source code size by roughly an order of magnitude. My theory is that that's a successful example of what a DSL might be.

I do not claim that maintainability can be achieved without the maintainer having to climb a learning curve. I think real maintainability comes at a price of programmers having to learn things that might not be easy to grasp, but once grasped have the desired value.

From linguists Saphir and Worf, we can learn, that grammatical features of a language influence our thinking = if you create a DSL, you will be thinking more domain-specific and probably less general-purpose. It is all about abstraction, just like general purpose programming languages tend to abstract from the machine, so we are able to focus more on algorithms, structures and design than on instruction set, addressing modes, register sizes etc.

Not so sure anyone has done any studies to the extent that you need. My experience though is that a DSL can be costly to create in the first place (possibly 2X or more effort than a simpler object model to do the same thing). However once created the developers would get immediate benefit by being able to do things quicker with the DSL than with the model.

The problem with the question is it's treating all DSLs as equal. Some would be easier to implement, others harder - whether one is doing Fluent Interfaces/Internal DSLs or external DSLs would lead to different times/cost to implement.

The one main benefit that might not be covered by such studies is the ease that a DSL can lead to expressing and implementing code. It can also help others understand the intent of the code possibly easier - and since maintenance phase of software development lifecycle is such a big component of the SDLC this could lead to far bigger benefits (in the long term) than initially lost in creating the DSL.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow