The dream of declarative programming [closed]

https://softwareengineering.stackexchange.com/questions/275680

declarative-programming

07-10-2020
|

Pergunta

Why hasn't the dream of declarative programming been realized? What are some concrete obstacles getting in the way? For a simple example why can't I say

sort(A) is defined by sort(A) in perm(A) && asc(sort(A))

and automatically get a sorting algorithm out of it. perm means permutations and asc means ascending.

Solução

There are some very good answers. I will try to contribute to the discussion.

On the topic of declarative, logic programming in Prolog, there is the great book "The Craft of Prolog" by Richard O'Keefe. It is about writing efficient programs using a programming language that lets you write very inefficient programs. In this book, while discussing the efficient implementations of several algorithms (in the chapter "Methods of Programming"), the author takes the following approach:

define the problem in English
write a working solution that is as declarative as possible; usually, that means pretty much exactly what you have in your question, just correct Prolog
from there, take steps to refine the implementation to make it faster

The most enlightening (for me) observation I was able to make while working my way through these:

Yes, the final version of the implementation is much more efficient than the "declarative" specification the author started with. It is still very declarative, succinct, and easy to understand. What has happened in between is that the final solution captures properties of the problem to which the initial solution was oblivious.

In other words, while implementing a solution, we have used as much of our knowledge about the problem as we can. Compare:

Find a permutation of a list such that all elements are in ascending order

to:

Merging two sorted lists will result in a sorted list. Since there might be sublists that are already sorted, use these as a starting point, instead of sublists of length 1.

A small aside: a definition like the one you have given is attractive because it is very general. However, I cannot escape the feeling that it purposefully ignores the fact that permutations are, well, a combinatorial problem. This is something we already know! This is not a criticism, just an observation.

As to the real question: how to move forward? Well, one way is to provide as much knowledge about the problem we are declaring to the computer.

The best attempt I know of to really solve the problem is presented in the books co-authored by Alexander Stepanov, "Elements of Programming" and "From Mathematics to Generic Programming". I am sadly not up to the task of summarizing (or even fully understanding) everything in these books. However, the approach there is to define efficient (or even optimal) library algorithms and data structures, under the provision that all relevant properties of the input are known in advance. The final result is:

Each well-defined transformation is a a refinement of the constraints that are already in place (the properties that are known);
We let the computer decide which transformation is optimal based on the existing constraints.

As to why it hasn't quite happened yet, well, computer science is a really young field, and we are still coping with truly appreciating the novelty of most of it.

To give you a taste of what I mean by "refining the implementation": take for example the easy problem of getting the last element of a list, in Prolog. The canonical declarative solution is to say:

last(List, Last) :-
    append(_, [Last], List).

Here, the declarative meaning of append/3 is:

List1AndList2 is the concatenation of List1 and List2

Since in the second argument to append/3 we have a list with only one element, and the first argument is ignored (the underscore), we get a split of the original list which discards the front of the list (List1 in the context of append/3) and demands that the back (List2 in the context of append/3) is indeed a list with only one element: so, it is the last element.

The actual implementation provided by SWI-Prolog, however, says:

last([X|Xs], Last) :-
    last_(Xs, X, Last).

last_([], Last, Last).
last_([X|Xs], _, Last) :-
    last_(Xs, X, Last).

This is still nicely declarative. Read top to bottom, it says:

The last element of a list only makes sense for a list of at least one element. The last element for a pair of the tail and the head of a list, then, is: the head, when the tail is empty, or the last of the non-empty tail.

The reason why this implementation is provided is to work around the practical issues surrounding the execution model of Prolog. Ideally, it shouldn't make a difference which implementation is used. Similarly, we could have said:

last(List, Last) :-
    reverse(List, [Last|_]).

The last element of a list is the first element of the reversed list.

If you want to get your fill of inconclusive discussions about what is good, declarative Prolog, just go through some of the questions and answers in the Prolog tag on Stack Overflow.

Outras dicas

Logical languages already do this. You can define sort similarly like you are doing it.

The main problem is performance. Computers might be great at computing lots of stuff, but they are inherently dumb. Every "clever" decision a computer might make was programmed into it by a programmer. And this decision is usually described not by how the end result looks like, but by how to achieve, step by step, this end result.

Imagine the story of a Golem. If you try to give him an abstract command, then at best, he will do it inefficiently and at worst, will hurt himself, you or someone else. But if you describe what you want in the greatest detail possible, you are guaranteed that task will be completed effectively and efficiently.

It is the programmer's job to decide on what level of abstraction to use. For the application you are making, are you going to go high-level and describe it in abstract way and take the performance hit or go low and dirty, spend 10x more time on it, but get algorithm that is 1000x more performant?

In addition to Euphoric's excellent point, I'd like to add that we already are using declarative languages in many places where they work well, i.e. describing state that isn't likely to change or requesting something for which the computer actually can generate efficient code on its own:

HTML declares what the content of a web page is.
CSS declares what various types of elements in a web page should look like.
Every relational database has a data definition language that declares what the structure of the database is.
SQL is much closer to declarative than imperative, since you tell it what you want to see and the database's query planner figures out how to actually make it happen.
One could argue that most configuration files (.vimrc, .profile, .bashrc, .gitconfig) are using a domain-specific language that's largely declarative

Abstractions are leaky

You can implement a declarative system where you declare what you want, and the compiler or interpretator figures out an order of execution. The theoretical benefit is that it frees you from having to think about the 'how', and you don't have to detail this implementation. However, in practice for general purpose computing you still have to think about the 'how' and write all kinds of tricks while keeping in mind how this will be implemented, since otherwise the compiler can (and often will) choose an implementation that will be very, very, very slow (e.g. n! operations where n would suffice).

In your particular example, you will get A sorting algorithm - it doesn't mean that you will get a good or even a somewhat usable one. Your given definition, if implemented literally (as a compiler likely would) results in http://en.wikipedia.org/wiki/Bogosort which is unusable for larger datasets - it is technically correct, but needs an eternity to sort a thousand numbers.

For some limited domains you can write systems that almost always do well in figuring out a good implementation, for example, SQL. For general purpose computing that doesn't work particularly well - you can write systems in, say, Prolog but you have to visualize how exactly your declarations will be converted to an imperative execution order in the end, and that loses much of the expected declarative programming benefits.

Computational decidability is the most important reason declarative programming has not proven to be as easy as it seems to be.

Many problems that are relatively easy to state have proven to be undecidable or have NP-complete complexity to solve. This often occurs when we take into account negative classes and classification, countability and recursion.

I'd like to examplify this with some domains that are well known.

Decision on which CSS class to use needs knowledge and consideration of all CSS rules. Adding new rules might invalidate all other decisions. Negative CSS classes are intentionally not added to the language, because of NP-complete problems, but the lack of negative classes complicates CSS design decisions.

Within an (SQL) query optimizer, there is the nettlesome problem of deciding in which order to join, which indices to use and which memory to allocate to which temporary results. This is a known NP-complete problem and complicates database-design and query formulation. To formulate it differently: when designing a database or a query, the designer needs to know the actions and order of actions the query optimizer is likely to take. An experienced engineer needs knowledge of the heuristics used by major database vendors.

Configuration files are declarative, but certain configurations are hard to declare. For example, to properly configure features one needs to take into account versioning, deployment (and deployment history), possible manual overrides and possible conflicts with other settings. To properly validate a configuration might become an NP-complete problem.

The result is that these complications take beginners by surprise, they break the 'beauty' of declarative programming and they cause some engineers to search for other solutions. The migration of inexperienced engineers from SQL to NoSQL might have been triggered by the underlying complexities of relational databases.

We have a difference in declarativeness of programming languages that is put to good use in the verification of digital logic.

Normally digital logic is described at the register transfer level (RTL) where the logic level of the signals between registers is defined. To check that we are increasingly adding properties defined in a more abstract and declarative manner.

One of the more declarative languages/language subsets is called PSL for Property Specification Language. When testing an RTL model of a multiplier in which, for example all the logic operations of shift and add over multiple clock cycles needs to be specified; you can write a property in the manner of assert that when enable is high, this output will equal the multiplication of these two inputs after no more than 8 clock cycles. The PSL description can then be checked together with the RTL in a simulation, or the PSL may be formally proved to hold for the RTL description.

The more declarative PSL model forces one to describe the same behaviour as the RTL description but in a sufficiently different way that can be automatically checked against the RTL to see if they agree.

Mostly the problem is how you model the data; and declarative programming isn't helping here. In imperative languages you already have tons of libraries that does lots of stuff for you, so you only need to know what to call. In a particular way one might consider this declarative programming (probably the best example for this is Stream API in Java 8). Having this, the abstraction is already resolved and declarative programming isn't necessary.

Also, as it has been said, logic programming languages already do exactly what you want. One might say the problem is performance, but with today's hardware and research in this area, things can be improved to be ready for production use; actually Prolog is used for AI stuff, but I believe only by academia.

It is to be noted that it applies for general-purpose programming languages. For domain specific languages, declarative languages are way better; SQL probably is the best example.

It would look something like this.. {(whatever => read a file & call a url) | call a url & read a file} However, these are actions to execute, and the state of the system changes as a result, but that is not obvious from the source.

Declaratives can describe a finite state machine and its transitions. The FSM is like the opposite of declaratives without actions, even if the only action is to change state to the next state.

The advantage of using this method is that transitions and actions can be specified by predicates that apply to multiple transitions, rather than just one.

I know this sounds a little strange, but in 2008 I wrote a program generator that uses this method, and the generated C++ is from 2 to 15 times as much as the source. I now have over 75,000 lines of C++ from 20,000 lines of input. Two things go with this: consistency and completeness.

Consistentcy: No two predicates that can be true at the same time can imply inconsistent actions, as both x = 8 and x = 9, nor different next states.

Completeness: The logic for every state transition is specified. These may be difficult to check for systems with N substates, with > 2**N states, but there are interesting combinatorial methods that can verify everything. In 1962 I wrote phase 1 of a system sort for the 7070 machines, using this kind of conditional code generation and combinatorial debug. Of the 8,000 lines in the sort, the number of bugs from the day of the first release forever was zero!

Phase two of the sort, 12,000 lines, had over 60 errors in the first two months. There is much more to say about this, but it works. If car manufacturers used this method to check the firmware, we would not see the failures that we see now.

Not everything can be represented in a declarative way.

Often, you explicitely want to control the flow of execution

For example following pseudo-code: if whatever read a file call an URL else call an URL write a file How would you represent it declaratively?

Sure, there are probably some exotic ways to do it. Like monads. But these usually are more cumbersome, complicated and far less intuitive than their procedural part.

It all boils down to the fact that "interacting" with your environment/system is not declarative. Everything related to I/O is procedural by essence. You have to tell exactly when and what should happen, and in what order.

Declarative is great for everything that is purely related to computation. Like a giant function, you put X in and you get Y. That's great. An example for this is HTML, the input is text, the output is what you see in your browser.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange