How does recursion in Prolog works from inside. One example

Question 1

The procedural counterpart of a prolog program is called SLDNF. Query:

 list_to_set([1,1,2,3], Result)

Now SLDNF tries to match [1,1,2,3] with [A|X] through a procedure called unification. In a few words, it checks whether variable can be instantiated. [A|X] is a shortcut, it means (loosely): "A is the first element and X is the rest". This can be done if A= 1 and X = [1,2,3] and Result = [1 |Y]. If the unification succeeds then we use the same substitution in the body of the rule (what appears after :-). You may wonder why are we using the second clause but not the third or the first? Prolog actually tries all of them, tryign to emulate a nondeterministic choice. procedurally though it tries them in order. The first clause can't be unified (A = ?, there's nothing in the list). The third clause will be checked later if our first attempt fails. Let's call it "choice point 1", because Prolog has this open path that leaves unexplored and that can use if the paths that have been chosen so far fail.

Now SLDNF has reduced your initial query to:

list_to_set([1,2,3], Y2), \+ member(1, Y2)       {Result = [1 | Y2]}

(Variables can be renamed, and we do it to avoid confusion with the program) This is where the magic happens. SLDNF now does the same thing as before but with a slightly different input [1,2,3]. Recursion at its best. So it tries to unify it with the first clause but it fails again. It succeeds with the second, and similarly you'll get that instead of the first element of the query (called literal) we have again the body, with the variable substitution X = [2, 3], A = 1, Y2 = [1 | Y].

Now we have

list_to_set([2,3], Y), \+ member (1,Y), \+ member(1, [1 | Y])  {Result = [1 | [1 | Y]]}

I'm not continuing until the end. Eventually we'll end up checking + member(1, [1 | Y]), which means "1 is not member of a list with head 1 and tail Y". This is a build in predicate and will fail, because 1 is in that list (it's the head). Prolog will go back to a choice point end eventually arrive at "choice point 1". Here the last condition in the clause is "A is member of Y". You can check yourself that this path will succeed eventually.

Sorry for the lengthy hurried answer, I hope it helps.

Question 2

If you look at a concrete example, you will soon be buried in lots of irrelevant detail. So much, that you will lose sight for the important properties of your program. Let's have a look at the following program fragment instead which is called a failure slice. You get a failure slice by adding goals false into your program. Failure slices share many interesting properties with the original program. For example, a goal Goal, false executed with the failure slice will never use more inferences than the original program. Or conversely, the original program will use more (or at best the same number of) inferences. So, let me point out one such slice:

list_to_set([],[]) :- false.
list_to_set([A|X],[A|Y]):-
   list_to_set(X,Y), false,
    \+member(A,Y).
list_to_set([A|X],Y):-
   list_to_set(X,Y), false,
   member(A,Y).

And since this fragment is no longer interested in concrete elements (the A is no longer used, nor member/2), we can use length/2 for the most general lists. In this manner we can observe the minimal number of inferences needed for every length like so:

?- length(L, N), call_inferences((list_to_set(L,_),false;true),Inf).
N = 0, Inf = 3 ;
N = 1, Inf = 6 ;
N = 2, Inf = 12 ;
N = 3, Inf = 24 ;
N = 4, Inf = 48 ;
N = 5, Inf = 96 ;
N = 6, Inf = 192 ;
N = 7, Inf = 384 ;
N = 8, Inf = 768 ;
N = 9, Inf = 1536 ;
N = 10, Inf = 3072 ...

Using

:- meta_predicate user:call_inferences(0,-).
call_inferences( Goal, Inferences) :-
   statistics(inferences, Inferences0),
   Goal,
   statistics(inferences, Inferences1),
   Inferences is Inferences1 - Inferences0.

The number of inferences doubles with each further element. That is they grow exponentially. Your implementation thus costs at least exponentially many inferences... No need to look at a concrete example.

There are more problems in your program:

?- L=[A,B],list_to_set(L,S), L = [a,b].

fails, whereas

?-  L=[A,B], L = [a,b], list_to_set(L,S).

succeeds. That is, your program is no longer a pure relation. Use maplist(dif(A),Y) in place of \+ member(A,Y).

Question 3

In certain implementations of Prolog, e.g. GNU Prolog, you can trace the execution of your code. Using this feature you can step through the evaluation process, as demonstrated below.

$ gprolog
GNU Prolog 1.4.2
By Daniel Diaz
Copyright (C) 1999-2012 Daniel Diaz
| ?- consult('list_to_set.pro').
yes
| ?- trace.
yes
{trace}
| ?- list_to_set([1,1,2,3], X).
      1    1  Call: list_to_set([1,1,2,3],_25) ?
      2    2  Call: list_to_set([1,2,3],_57) ?
      3    3  Call: list_to_set([2,3],_83) ?
      4    4  Call: list_to_set([3],_109) ?
      5    5  Call: list_to_set([],_135) ?
      5    5  Exit: list_to_set([],[]) ?
      6    5  Call: \+member(3,[]) ?
      7    6  Call: member(3,[]) ?
      7    6  Fail: member(3,[]) ?
      6    5  Exit: \+member(3,[]) ?
      4    4  Exit: list_to_set([3],[3]) ?
      7    4  Call: \+member(2,[3]) ?
      8    5  Call: member(2,[3]) ?
      ...

An explanation of how to interpret Prolog's traces can be found at http://remus.rutgers.edu/cs314/f2007/ryder/projects/prolog/prologTrace.html, section READING A TRACE.

Question 4

Let's look at it in English and see if it makes intuitive sense. It may help to rename the predicate to seem a little less procedural, so I'm going to call it list_set.

list_set([],[]).

This says the empty set corresponds to the empty list.

list_set([A|X], [A|Y]):-
  list_set(X,Y),
  \+ member(A,Y).

This says that a list starting with A and continuing with X corresponds to a set starting with A and continuing with Y, provided A is not in Y and Y is the set corresponding to X. Or, in a more step-wise fashion, given a list starting with A (the remainder being X), we have a set starting with A (the remainder being Y), assuming Y is the set corresponding to X and A is not in Y. The negation operator always looked strange to me, and it's interesting looking in part because what's happening here is that A is being preserved while in the next clause, the absence of A means it is effectively being deleted.

list_set([A|X], Y):-
  list_set(X, Y),
  member(A, Y).

This is just the "else" case for the previous clause. It says that a list starting with A and continuing with X corresponds to a set Y, provided that Y also corresponds to the list X and A is already in Y. This is how elements are removed from the result, because A appears in the first argument's pattern but not in the second argument's pattern.