Perché sono gli effetti collaterali considerati il ??male nella programmazione funzionale?

https://softwareengineering.stackexchange.com/questions/15269

22-10-2019
|

Domanda

mi sembra che gli effetti collaterali sono un fenomeno naturale. Ma è qualcosa di simile tabù in linguaggi funzionali. Quali sono le ragioni?

La mia domanda è specifico di stile di programmazione funzionale. Non tutti i linguaggi di programmazione / paradigmi.

Soluzione

Scrivere le funzioni / metodi senza effetti collaterali - quindi sono funzioni pure -. Rende più facile ragionare sulla correttezza del programma

E 'anche facile comporre le funzioni per creare nuovi comportamenti.

E 'anche possibile alcune ottimizzazioni, dove la lattina compilatore per esempio memoise i risultati delle funzioni, o l'uso comune sottoespressione eliminazione.

Edit: su richiesta del Benjol: Perché un sacco di tuo stato di immagazzinato nello stack (flusso di dati, senza controllo di flusso, come Jonas ha definito qui ), è possibile parallelizzare o in altro modo riordino l'esecuzione di quelle parti del calcolo che sono indipendenti l'uno dall'altro. Per la ricerca di quelle parti indipendenti, perché da una parte non fornisce ingressi verso l'altro.

In ambienti con debugger che consentono di rollback lo stack e riprendere il calcolo (come Smalltalk), con funzioni puri significa che si può facilmente vedere come cambia un valore, perché gli stati precedenti sono disponibili per la consultazione. In un calcolo mutazione pesante, a meno che non si aggiunge esplicitamente fare / annullare le azioni per la vostra struttura o algoritmo, non è possibile vedere la storia del calcolo. (Questo si lega indietro al primo comma: la scrittura funzioni pure rende più facile per controllare la correttezza del programma.)

Altri suggerimenti

Da un articolo su programmazione funzionale :

In pratica, le applicazioni devono avere alcuni effetti collaterali. Simon Peyton-Jones, uno dei principali responsabili del linguaggio di programmazione funzionale Haskell, ha dichiarato quanto segue: "Alla fine, ogni programma deve manipolare stato del programma A, che non ha effetti collaterali di sorta è una sorta di scatola nera Tutto si può dire è.. che la casella diventa più caldo." ( http://oscon.blip.tv/file/324976 ) La chiave è quello di limitare gli effetti collaterali, chiaramente identificarli ed evitare disperdendoli in tutto il codice.

Hai capito male, la programmazione funzionale promuove limitando gli effetti collaterali per rendere i programmi facili da capire e ottimizzare. Anche Haskell consente di scrivere ai file.

In sostanza quello che sto dicendo è che i programmatori funzionali non pensano effetti collaterali sono il male, semplicemente pensano che limita l'uso di effetti collaterali è buona. So che può sembrare una tale semplice distinzione, ma fa la differenza.

Alcune note:

Funzioni senza effetti collaterali può trivialy essere eseguito in parallelo, mentre le funzioni con effetti collaterali in genere richiedono una sorta di sincronizzazione.
Funzioni senza effetti collaterali consentono un'ottimizzazione più aggressivo (ad esempio da transparentely utilizzando una cache risultato), perché finché si ottiene il risultato giusto, ma non ha nemmeno importa se la funzione era davvero eseguito

Io lavoro principalmente in codice funzionale ora, e da quel punto di vista mi sembra assolutamente ovvio. Gli effetti collaterali di creare un enorme peso mentale sulla programmatori che cercano di leggere e capire il codice. Non si nota tale onere fino a quando si è liberi da esso per un po ', poi improvvisamente hanno di leggere il codice con effetti collaterali di nuovo.

Si consideri questo semplice esempio:

val foo = 42
// Several lines of code you don't really care about, but that contain a
// lot of function calls that use foo and may or may not change its value
// by side effect.

// Code you are troubleshooting
// What's the expected value of foo here?

In un linguaggio funzionale, I so che foo è ancora 42. Non hanno nemmeno bisogno di aspetto il codice in mezzo, e tanto meno capire, o guardare le implementazioni delle funzioni chiamate.

Tutta quella roba su concorrenza e parallelizzazione e ottimizzazione è bello, ma questo è ciò che gli scienziati del computer messo sulla brochure. Non dover meraviglia che sta mutando la variabile e quando è ciò che mi piace molto in pratica quotidiana.

pochi a nessun lingue rendono impossibile per causare effetti collaterali. Lingue che erano completamente privi di effetti collaterali sarebbe proibitivo difficile (quasi impossibile) per l'uso, se non in una capacità molto limitata.

Perché gli effetti collaterali sono considerati il ??male?

perché rendono molto più difficile ragionare su esattamente ciò che un programma fa, e per dimostrare che fa quello che ci si aspetta che faccia.

A un livello molto alto, immaginate testare un intero sito web delle 3 file con un solo test black-box. Certo, è fattibile, a seconda della scala. Ma c'è sicuramente un sacco di duplicazione in corso. E se c'è è un bug (che è legato ad un effetto collaterale), allora si potrebbe potenzialmente rompere l'intero sistema per ulteriori test, fino a quando il bug viene diagnosticata e fissato, e la correzione viene distribuito l'ambiente di test.

Vantaggi

Ora, scala che verso il basso. Se tu fossi abbastanza bravo a scrivere codice libero effetto collaterale, quanto più veloce saresti al ragionamento a quello che alcuni codice esistente ha fatto? Quanto più velocemente si potrebbe scrivere unit test? Come vi sentireste sicuri che il codice senza effetti collaterali è stato garantito privo di bug, e che gli utenti potrebbero limitare la loro esposizione ad eventuali bug che ha avere?

Se il codice non ha effetti collaterali, il compilatore può anche avere ulteriori ottimizzazioni che potrebbe svolgere. Può essere molto più facile da implementare queste ottimizzazioni. Può essere molto più facile da concettualizzare anche un'ottimizzazione per codice libero effetto collaterale, il che significa che il fornitore del compilatore potrebbe implementare ottimizzazioni che sono difficili da impossibile in codice con effetti collaterali.

La concorrenza è anche drasticamente più semplice da implementare, per generare automaticamente, e di ottimizzare quando il codice non ha effetti collaterali. Questo perché tutti i pezzi possono essere valutati in modo sicuro in qualsiasi ordine. Permettere ai programmatori di scrivere codice altamente concomitante è ampiamente considerato il prossimo grande sfida che l'informatica deve affrontare, e uno dei pochi rimasti siepi contro legge di Moore .

Gli effetti collaterali sono come "fughe di notizie" nel codice che dovranno essere gestite in seguito, da voi o qualche collega ignaro.

I linguaggi funzionali evitare variabili di stato e dati mutevoli come un modo di rendere il codice meno dipendente dal contesto e più modulare. Modularità azienda assicura che il lavoro di uno sviluppatore non influirà / minano il lavoro di un altro.

Scaling velocità di sviluppo, con dimensioni del team, è un "Sacro Graal" dello sviluppo del software di oggi. Quando si lavora con altri programmatori, poche cose sono importanti come la modularità. Anche il più semplice degli effetti collaterali logiche rendono estremamente difficile la collaborazione.

Well, IMHO, this is quite hypocritical. Nobody likes side effects, but everybody needs them.

What is so dangerous about side effects is that if you call a function, then this possibly has an effect not only on the way the function behaves when it is called next time, but possibly it has this effect on other functions. Thus side effects introduce unpredictable behavior and nontrivial dependencies.

Programming paradigms such as OO and functional both address this problem. OO reduces the problem by imposing a separation of concerns. This means the application state, which consists of a lot of mutable data, is encapsulated into objects, each of which is responsible for maintaining its own state only. This way the risk of dependencies is reduced and problems are far more isolated and easier to track.

Functional programming takes a far more radical approach, where the application state is simply immutable from the perspective of the programmer. This is a nice idea, but renders the language useless on its own. Why? Because ANY I/O-operation has side effects. As soon as you read from any input stream, you application state is likely to change, because the next time you invoke the same function, the result is likely to be different. You may be reading different data, or - also a possibility - the operation might fail. The same is true for output. Even output is an operation with side effects. This is nothing you realize often nowadays, but imagine you have only 20K for your output and if you output any more, your app crashes because you're out of disk space or whatever.

So yes, side effects are nasty and dangerous from the perspective of a programmer. Most bugs come from the way certain parts of the application state are interlocked in a nearly obscure way, through unconsidered and oftentimes unnecessary side effects. From the perspective of a user, side effects are the point of using a computer. They don't care for what happens inside or how it is organized. They do something and expect the computer to CHANGE accordingly.

Any side-effect introduces extra input/output parameters which must be taken in account when testing.

This make code validation much more complex as the environment cannot be limited to just the code being validated, but must bring in some or all of the surrounding environment (the global that is updated lives in that code over there, which in turn depends on that code, which in turn depends on living inside a full Java EE server....)

By trying to avoid side-effects you limit the amount of externalism needed to run the code.

In my experience good design in Object Orientated programing mandates the use of functions that have side effects.

For example, take a basic UI desktop application. I may have a running program that has on its heap an object graph representing the current state of domain model of my program. Messages arrive to the objects in that graph (for instance, via methods calls invoked from the UI layer controller). The object graph (domain model) on the heap is modified in response to the messages. Observers of the model are informed of any changes, the UI and maybe other resources are modified.

Far from being evil the correct arrangement of these heap-modifying and screen-modifying side effects are at the core of OO design (in this case the MVC pattern).

Of course, that does not mean that your methods should have arbitary side-effects. And side effect free functions do have a place in improving the readbility and sometimes performance, of your code.

Evil is a bit over the top.. it all depends on the context of the usage of the language.

Another consideration to those already mentioned is that it makes proofs of correctness of a program much simpler if there are no functional side effects.

As questions above have pointed out, functional languages don't so-much prevent code from having side effects as provide us with tools for managing what side effects can happen in a given piece of code and when.

This turns out to have very interesting consequences. First, and most obviously, there are numerous things that you can do with side-effect free code, which have already been described. But there are other things we can do, too, even when working with code that does have side effects:

In code with mutable state, we can manage the scope of the state in such a way as to statically ensure that it cannot leak outside of a given function, this enabling us to collect garbage without either reference counting or mark-and-sweep style schemes, yet still be sure that no references survive. The same guarantees are also useful for maintaining privacy-sensitive information, etc. ( This can be achieved using the ST monad in haskell)
When modifying shared state in multiple threads, we can avoid the need for locks by tracking changes and performing an atomic update at the end of a transaction, or rolling the transaction back and repeating it if another thread made a conflicting modification. This is only achievable because we can ensure that the code has no effects other than the state modifications (which we can happily abandon). This is performed by the STM (Software Transactional Memory) monad in Haskell.
we can track the effects of code and trivially sand box it, filtering any effects it may need to perform in order to be sure it's safe, thus allowing (for example) user inputted code to be executed securely on a web site

In complex code bases, complex interactions of side effects are the most difficult thing I find to reason about. I can only speak personally given the way my brain works. Side effects and persistent states and mutating inputs and so on make me have to think about "when" and "where" things happen to reason about correctness, not just "what" is happening in each individual function.

I can't just focus on "what". I can't conclude after thoroughly testing a function which causes side effects that it will spread an air of reliability throughout the code using it, since callers might still misuse it by calling it at the wrong time, from the wrong thread, in the wrong order. Meanwhile a function that causes no side effects and just returns a new output given an input (without touching the input) is pretty much impossible to misuse in this way.

But I'm a pragmatic type, I think, or at least try to be, and I don't think we necessarily have to stamp out all side effects to the barest minimum to reason about the correctness of our code (at the very least I would find this very difficult to do in languages like C). Where I find it very difficult to reason about correctness is when we have the combination of complex control flows and side effects.

Complex control flows to me are the ones that are graph-like in nature, often recursive or recursive-like (event queues, e.g., which aren't directly calling events recursively but are "recursive-like" in nature), maybe doing things in the process of traversing an actual linked graph structure, or processing a non-homogeneous event queue that contains an eclectic mixture of events to process leading us to all kinds of different parts of the codebase and all triggering different side effects. If you tried to draw out all the places you'll ultimately end up in the code, it would resemble a complex graph and potentially with nodes in the graph you never expected would have been there in that given moment, and given that they are all causing side effects, that means you might not just be surprised about what functions are called but also what side effects are occurring during that time and the order in which they are occurring.

Functional languages can have extremely complex and recursive control flows, but the result is so easy to comprehend in terms of correctness because there aren't all sorts of eclectic side effects going on in the process. It's only when complex control flows meet eclectic side effects that I find it headache-inducing to try to comprehend the entirety of what's going on and whether it'll always do the right thing.

So when I have those cases, I often find it very difficult, if not impossible, to feel very confident about the correctness of such code, let alone very confident that I can make changes to such code without tripping on something unexpected. So the solution to me there is either simplify the control flow or minimize/unify the side effects (by unifying, I mean like only causing one type of side effect to many things during a particular phase in the system, not two or three or a dozen). I need one of those two things to happen to allow my simpleton brain to feel confident about the correctness of the code that exists and the correctness of the changes I introduce. It is pretty easy to be confident about the correctness of code introducing side effects if the side effects are uniform and simple along with the control flow, like so:

for each pixel in an image:
    make it red

It's pretty easy to reason about the correctness of such code, but mainly because the side effects are so uniform and the control flow is so dead simple. But let's say we had code like this:

for each vertex to remove in a mesh:
     start removing vertex from connected edges():
         start removing connected edges from connected faces():
             rebuild connected faces excluding edges to remove():
                  if face has less than 3 edges:
                       remove face
             remove edge
         remove vertex

Then this is ridiculously oversimplified pseudocode which would typically involve far more functions and nested loops and much more things that would have to go on (updating multiple texture maps, bone weights, selection states, etc), but even the pseudocode makes it so difficult to reason about correctness because of the interaction of the complex graph-like control flow and side effects going on. So one strategy to simplify that is to defer the processing and just focus on one type of side effect at a time:

for each vertex to remove:
     mark connected edges
for each marked edge:
     mark connected faces
for each marked face:
     remove marked edges from face
     if num_edges < 3:
          remove face

for each marked edge:
     remove edge
for each vertex to remove:
     remove vertex

... something to this effect as one iteration of simplification. That means we're passing through the data multiple times which is definitely incurring a computational cost, but we often find we can multithread such resulting code more easily, now that the side effects and control flows have taken on this uniform and simpler nature. Furthermore each loop can be made more cache-friendly than traversing the connected graph and causing side effects as we go (ex: use a parallel bit set to mark what needs to be traversed so that we can then do the deferred passes in sorted sequential order using bitmasks and FFS). But most importantly, I find the second version so much easier to reason about in terms of correctness as well as change without causing bugs. So that's how I approach it anyway and I apply the same kind of mindset to simplify mesh processing there above as simplifying event handling and so forth -- more homogeneous loops with dead simple control flows causing uniform side effects.

And after all, we need side effects to occur at some point, or else we'd just have functions that output data with nowhere to go. Often we need to record something to a file, display something to a screen, send the data over through a socket, something of this sort, and all of these things are side effects. But we can definitely reduce the number of superfluous side effects that go on, and also reduce the number of side effects going on when the control flows are very complicated, and I think it'd be a lot easier to avoid bugs if we did.

It is not evil. My opinion, it is necessary to distinguish the two function types - with side effects and without. The function without side effects: - returns always the same with same arguments, so for example such function without any arguments makes no sense. - That also means, that the order in what some such functions are called plays no role - must be able to run and may be debugged just alone (!), without any another code. And now, lol, look what JUnit makes. A function with side effects: - has sort of "leaks", what can be highlighted automatically - it is very important by debugging and searching of mistakes, what generally are caused by side effects. - Any function with side effects has also a "part" of itself without side effects, what can also be separated automatically. So evil are those side effects, what produce difficult mistakes to track.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a softwareengineering.stackexchange