How to duplicate the behavior of predefined length/2 in SWI-Prolog?

Question 1

In SWI-Prolog, the nondeterminism issue can be solved with CLP(FD)'s zcompare/3, which reifies the inequality to a term that can be used for indexing:

:- use_module(library(clpfd)).

my_length(Ls, L) :-
        zcompare(C, 0, L),
        my_length(Ls, C, 0, L).

my_length([], =, L, L).
my_length([_|Ls], <, L0, L) :-
        L1 #= L0 + 1,
        zcompare(C, L1, L),
        my_length(Ls, C, L1, L).

Your example is now deterministic (since recent versions of SWI-Prolog perform just-in-time indexing):

?- my_length(Ls, 3).
Ls = [_G356, _G420, _G484].

All serious Prolog implementations ship with CLP(FD), and it makes perfect sense to use it here. Ask your vendor to also implement zcompare/3 or a better alternative if it is not already available.

Question 2

For a set of test cases, please refer to this table and to the current definition in the prologue. There are many more odd cases to consider.

Defining length/2 with var/nonvar, is/2 and the like is not entirely trivial, because (is)/2 and arithmetic comparison is so limited. That is, they produce very frequently instantiation_errors instead of succeeding accordingly. Just to illustrate that point: It is trivial to define length_sx/2 using successor-arithmetics.

length_sx([], 0).
length_sx([_E|Es], s(X)) :-
   length_sx(Es, X).

This definition is pretty perfect. It even fails for length_sx(L, L). Alas, successor arithmetics is not supported efficiently. That is, an integer i requires O(i) space and not O(log i) as one would expect.

The definition I would have preferred is:

length_fd([],0).
length_fd([_E|Es], L0) :-
   L0 #> 0,
   L1 #= L0-1,
   length_fd(Es, L1).

Which is the most direct translation. It is quite efficient with a known length, but otherwise the overhead of constraints behind shows. Also, there is this asymmetry:

?- length_fd(L,0+0).
false.

?- length_fd(L,0+1).
L = [_G919] ;
false.

However, your definition using library(clpfd) is particularly elegant and efficient even for more elaborate cases.. It isn't as fast as the built-in length...

?- time(( length_fd(L,N),N=1000 )).
% 29,171,112 inferences, 4.110 CPU in 4.118 seconds (100% CPU, 7097691 Lips)
L = [_G67, _G98, _G123, _G159, _G195, _G231, _G267, _G303, _G339|...],
N = 1000 .

?- time(( my_len_clp(L,N),N=10000 )).
% 1,289,977 inferences, 0.288 CPU in 0.288 seconds (100% CPU, 4484310 Lips)
L = [_G67, _G79, _G82, _G85, _G88, _G91, _G94, _G97, _G100|...],
N = 10000 .

?- time(( length(L,N),N=10000 )).
% 30,003 inferences, 0.006 CPU in 0.006 seconds (100% CPU, 4685643 Lips)
L = [_G67, _G70, _G73, _G76, _G79, _G82, _G85, _G88, _G91|...],
N = 10000 .

... but then it is able to handle constraints correctly:

?- N in 1..2, my_len_clp(L,N).
N = 1,
L = [_G1439] ;
N = 2,
L = [_G1439, _G1494] ;
false.

?- N in 1..2, length(L,N).
N = 1,
L = [_G1445] ;
N = 2,
L = [_G1445, _G1448] ;
*LOOPS*

Question 3

I am not especially confident in this answer but my thinking is no, you have to do some extra work to make Prolog do the right thing for length/2, which is a real shame because it's such a great "tutorial" predicate in the simplest presentation.

I submit as proof, the source code to this function in SWI-Prolog and the source in GNU Prolog. Neither of these is a terse, cute trick, and it looks to me like they both work by testing the arguments and then deferring processing to different internal functions depending on which argument is instantiated.

I would love to be wrong about this though. I have often wondered why it is, for instance, so easy to write member/2 which does the right thing but so hard to write length/2 which does. Prolog isn't great at arithmetic, but is it really that bad? Here's hoping someone else comes along with a better answer.

Question 4

(I've tried to edit @false's response, but it was rejected)

my_len_tail/2 is faster (in terms of both the number of inferences and actual time) than buldin length/2 when generating a list, but has problem with N in 1..2 constraint.

?- time(( my_len_tail(L,N),N=10000000 )).
% 20,000,002 inferences, 2.839 CPU in 3.093 seconds (92% CPU, 7044193 Lips)
L = [_G67, _G70, _G73, _G76, _G79, _G82, _G85, _G88, _G91|...],
N = 10000000 .

?- time(( length(L,N),N=10000000 )).
% 30,000,004 inferences, 3.557 CPU in 3.809 seconds (93% CPU, 8434495 Lips)
L = [_G67, _G70, _G73, _G76, _G79, _G82, _G85, _G88, _G91|...],
N = 10000000 .

Question 5

This works for all your test cases (but it has red cut):

my_length([], 0).
my_length([_|T], N) :- 
    ( integer(N) ->
        !, 
        N > 0, 
        my_length(T, X), N is 1 + X, !
    ;
        my_length(T, X), N is 1 + X
    ).

Question 6

implementation

goal_expansion((_lhs_ =:= _rhs_),(when(ground(_rhs_),(_lhs_ is _rhs_))))  .

:- op(2'1,'yfx','list')  .

_list_ list [size:_size_] :-
_list_ list [size:_size_,shrink:_shrink_] ,
_list_ list [size:_size_,shrink:_shrink_,size:_SIZE_]  .

_list_ list [size:0,shrink:false]  .

_list_ list [size:_size_,shrink:true] :-
when(ground(_size_),(_size_ > 0))  .

[] list [size:0,shrink:false,size:0] .

[_car_|_cdr_] list [size:_size_,shrink:true,size:_SIZE_] :-
(_SIZE_ =:= _size_ - 1) ,
(_size_ =:= _SIZE_ + 1) ,
_cdr_ list [size:_SIZE_]  .

testing

/*
   ?- L list Z .
L = [],
Z = [size:0] ? ;
L = [_A],
Z = [size:1] ? ;
L = [_A,_B],
Z = [size:2] ? ;
L = [_A,_B,_C],
Z = [size:3] ?
yes

   ?- L list [size:0] .
L = [] ? ;
no
   ?- L list [size:1] .
L = [_A] ? ;
no
   ?- L list [size:2] .
L = [_A,_B] ? ;
no

   ?- [] list [size:S] .
S = 0 ? ;
no
   ?- [a] list [size:S] .
S = 1 ? ;
no
   ?- [a,b] list [size:S] .
S = 2 ? ;
no
   ?- [a,b,c] list [size:S] .
S = 3 ? ;
no
   ?- 
*/