What is the difference between eq?, eqv?, equal?, and = in Scheme?

Question 1

I'll answer this question incrementally. Let's start with the = equivalence predicate. The = predicate is used to check whether two numbers are equal. If you supply it anything else but a number then it will raise an error:

(= 2 3)     => #f
(= 2.5 2.5) => #t
(= '() '()) => error

The eq? predicate is used to check whether its two parameters respresent the same object in memory. For example:

(define x '(2 3))
(define y '(2 3))
(eq? x y)         => #f
(define y x)
(eq? x y)         => #t

Note however that there's only one empty list '() in memory (actually the empty list doesn't exist in memory, but a pointer to the memory location 0 is considered as the empty list). Hence when comparing empty lists eq? will always return #t (because they represent the same object in memory):

(define x '())
(define y '())
(eq? x y)      => #t

Now depending upon the implementation eq? may or may not return #t for primitive values such as numbers, strings, etc. For example:

(eq? 2 2)     => depends upon the implementation
(eq? "a" "a") => depends upon the implementation

This is where the eqv? predicate comes into picture. The eqv? is exactly the same as the eq? predicate, except that it will always return #t for same primitive values. For example:

(eqv? 2 2)     => #t
(eqv? "a" "a") => depends upon the implementation

Hence eqv? is a superset of eq? and for most cases you should use eqv? instead of eq?.

Finally we come to the equal? predicate. The equal? predicate is exactly the same as the eqv? predicate, except that it can also be used to test whether two lists, vectors, etc. have corresponding elements which satisfy the eqv? predicate. For example:

(define x '(2 3))
(define y '(2 3))
(equal? x y)      => #t
(eqv? x y)        => #f

In general:

Use the = predicate when you wish to test whether two numbers are equivalent.
Use the eqv? predicate when you wish to test whether two non-numeric values are equivalent.
Use the equal? predicate when you wish to test whether two lists, vectors, etc. are equivalent.
Don't use the eq? predicate unless you know exactly what you're doing.

Question 2

eq? is #t when it is the same address/object. Normally one could expect #t for same symbol, boolean and object and #f for values that is of different type, with different values, or not the same structure Scheme/Lisp-implementations has a tradition to embed type in their pointers and to embed values in the same space if it's enough space. Thus some pointers really are not addresses but values, like the char R or the Fixnum 10. These will be eq? since the "address" is an embedded type+value. Some implementations also reuse immutable constants. (eq? '(1 2 3) '(1 2 3)) might be #f when interpreted but #t when compiled since it might get the same address. (Like the constant String pool in Java). Because of this, many expresions involving eq? are unspecified, thus wether it evaluates to #t or #f is implementation dependent.

eqv? are #t for the same things as eq?. It is also #t if it's a number or character and it's value is the same, even when the data is too big to fit into a pointer. Thus for those eqv? does the extra work of checking that type is one of the supported, that both are the same type and it's target objects have the same data value.

equal? is #t for the same things as eqv? and if it's a compound type like pair, vector, string, and bytevector it recursively does equal? with the parts. In practice it will return #t if the two objects looks the same. Prior to R6RS, it's unsafe to use equal? on circular structures.

= is like eqv? but it only works for numeric types. It might be more efficient.

string=? is like equal?, but it only works for strings. It might be more efficient.

Question 3

There are a full two pages in the RnRS specification related to eq?, eqv?, equal? and =. Here is the Draft R7RS Specification. Check it out!

Explanation:

= compares numbers, 2.5 and 2.5 are numerically equal.
equal? for numbers reduces to =, 2.5 and 2.5 are numerically equal.
eq? compares 'pointers'. The number 5, in your Scheme implementation, is implemented as an 'immediate' (likely), thus 5 and 5 are identical. The number 2.5 may require an allocation of a 'floating point record' in your Scheme implementation, the two pointers are not identical.

Question 4

equal? recursively compares two objects (of any type) for equality.

Note this could be expensive for a large data structure since potentially the entire list, string, vector, etc must be traversed.
If the object just contains a single element (EG: number, character, etc), this is the same as eqv?.

eqv? tests two objects to determine if both are "normally regarded as the same object".

eqv? and eq? are very similar operations, and the differences between them are going to be somewhat implementation specific.

eq? is the same as eqv? but may be able to discern finer distinctions, and may be implemented more efficiently.

According to the spec, this might be implemented as a fast and efficient pointer comparison, as opposed to a more complicated operation for eqv?.

= compares numbers for numerical equality.

Note that more than two numbers can be provided, eg: (= 1 1.0 1/1 2/2)

Question 5

You don't mention a scheme implementation, but in Racket, eq? only returns true if the arguments refer to the same object. Your second example is yielding #f because the system is creating a new floating point number for each argument; they're not the same object.

equal? and = are checking for value equivalence, but = is only applicable to numbers.

If you're using Racket, check here for more information. Otherwise, check the documentation of your scheme implementation.

Question 6

Think of eq? as pointer equality. The authors of the Report want it to be as general as possible so they don't say this outright because it's implementation-dependent, and to say it, would favor the pointer-based implementations. But they do say

It will usually be possible to implement eq? much more efficiently than eqv?, for example, as a simple pointer comparison

Here's what I mean. (eqv? 2 2) is guaranteed to return #t but (eq? 2 2) is unspecified. Now imagine a pointer-based implementation. In it eq? is just pointer comparison. Since (eq? 2 2) is unspecified, it means that this implementation is free to just create new memory object representation of each new number it reads from the source code. eqv? must actually inspect its arguments.

OTOH (eq 'a 'a) is #t. This means that such implementation must recognize symbols with duplicate names and use the same one representation object in memory for all of them.

Suppose an implementation is not pointer-based. As long as it adheres to the Report, it doesn't matter. The authors just don't want to be seen as dictating the specifics of implementations to the implementors, so they choose their wording carefully.

This is my guess anyway.

So very coarsely, eq? is pointer equality, eqv? is (atomic-)values-aware, equal? is also structure-aware (checks into its arguments recursively, so that finally (equal? '(a) '(a)) is required to be #t), = is for numbers, string=? is for strings, and the details are in the Report.

Question 7

Apart from the previous answers, I will add some comments.

All these predicates want to define the abstract function of identity for an object but in different contextes.

EQ? is implementation-dependent and it does not answer the question are 2 objects the same? only in limited use. From implementation point of view, this predicate just compares 2 numbers (pointer to objects), it does not look at the content of the objects. So, for example, if your implementation does not uniquely keep the strings inside but allocates different memory for each string, then (eq? "a" "a") will be false.

EQV? -- this looks inside the objects, but with limited use. It is implementation-dependent if it returns true for (eqv? (lambda(x) x) (lambda(x) x)). Here it's a full philosophy how to define this predicate, as we know nowadays that there are some fast methods to compare the functionality of some functions, with limited use. But eqv? provides coherent answer for big numbers, strings, etc.

Practically, some of these predicates tries to use the abstract definition of an object (mathematically), while others use the representation of an object (how it's implemented on a real machine). The mathematical definition of identity comes from Leibniz and it says:

X = Y  iff  for any P, P(X) = P(Y)
X, Y being objects and
P being any property associated with object X and Y.

Ideally it would be to be able to implement this very definition on computer but for reasons of indecidability and/or speed it is not implemented literally. This is why there are lots of operators that try each one to focus on different viewpoints around this definition.

Try to imagine the abstract definition of an identity for a continuation. Even if you can provide a definition of a subset of functions (sigma-recursive class of functions), the language does not impose any predicate to be true or false. It would complicate a lot both the definition of the language and much more the implementation.

The context for the other predicates is easier to analyze.