Please confirm or correct my “English interpretation” of this Haskell code snippet

https://stackoverflow.com/questions/775730

13-09-2019
|

Question

I'm a C# developer who is working through "Real World Haskell" in order to truly understand functional programming, so that when I learn F#, I'll really grok it and not just "write C# code in F#", so to speak.

Well, today I came across an example which I thought I understood 3 different times, only to then see something I missed, update my interpretation, and recurse (and curse too, believe me).

Now I believe that I do actually understand it, and I have written a detailed "English interpretation" below. Can you Haskell gurus please confirm that understanding, or point out what I have missed?

Note: The Haskell code snippet (quoted directly from the book) is defining a custom type that is meant to be isomorphic to the built in Haskell list type.

The Haskell code snippet

data List a = Cons a (List a)
              | Nil
              defining Show

EDIT: After some responses, I see one misunderstanding I made, but am not quite clear on the Haskell "parsing" rules that would correct that mistake. So I've included my original (incorrect) interpretation below, followed by a correction, followed by the question that still remains unclear to me.

EDIT: Here is my original (incorrect) "English interpretation" of the snippet

I am defining a type called "List".
The List type is parameterised. It has a single type parameter.
There are 2 value constructors which can be used to make instances of List. One value constructor is called "Nil" and the other value constructor is called "Cons".
If you use the "Nil" value constructor, then there are no fields.
The "Cons" value constructor has a single type parameter.
If you use the "Cons" value constructor, there are 2 fields which must be provided. The first required field is an instance of List. The second required field is an instance of a.
(I have intentionally omitted anything about "defining Show" because it is not part of what I want to focus on right now).

The corrected interpretation would be as follows (changes in BOLD)

I am defining a type called "List".
The List type is parameterised. It has a single type parameter.
There are 2 value constructors which can be used to make instances of List. One value constructor is called "Nil" and the other value constructor is called "Cons".
If you use the "Nil" value constructor, then there are no fields.

5. (this line has been deleted ... it is not accurate) The "Cons" value constructor has a single type parameter.
If you use the "Cons" value constructor, there are 2 fields which must be provided. The first required field is an instance of a. The second required field is an instance of "List-of-a".
(I have intentionally omitted anything about "defining Show" because it is not part of what I want to focus on right now).

The question which is still unclear

The initial confusion was regarding the portion of the snippet that reads "Cons a (List a)". In fact, that is the part that is still unclear to me.

People have pointed out that each item on the line after the "Cons" token is a type, not a value. So that means this line says "The Cons value constructor has 2 fields: one of type 'a' and the other of type 'list-of-a'."

That is very helpful to know. However, something is still unclear. When I create instances using the Cons value constructor, those instances "interpret" the first 'a' as meaning "place the value passed in here." But they do not interpret the second 'a' the same way.

For example, consider this GHCI session:

*Main> Cons 0 Nil
Cons 0 Nil
*Main> Cons 1 it
Cons 1 (Cons 0 Nil)
*Main>

When I type "Cons 0 Nil", it uses the "Cons" value constructor to create an instance of List. From 0, it learns that the type parameter is "Integer". So far, no confusion.

However, it also determines that the value of the first field of the Cons is 0. Yet it determines nothing about the value of the second field ... it only determines that the second field has a type of "List Integer".

So my question is, why does "a" in the first field mean "the type of this field is 'a' and the value of this field is 'a'", while "a" in the second field means only "the type of this field is 'List of a'"?

EDIT: I believe I have now seen the light, thanks to several of the responses. Let me articulate it here. (And if somehow it is still incorrect in some fashion, please by all means let me know!)

In the snippet "Cons a (List a)", we are saying that the "Cons" value constructor has two fields, and that the first field is of type 'a', and that the second field is of type 'List of a'.

That is all we are saying! In particular, we are saying NOTHING about values! This is a key point I was missing.

Later, we want to create an instance, using the "Cons" value constructor. We type this into the interpreter: "Cons 0 Nil". This explicitly tells the Cons value constructor to use 0 for the value of the first field, and to use Nil as the value for the second field.

And that's all there is to it. Once you know that the value constructor definition specifies nothing but types, everything becomes clear.

Thanks everyone for the helpful responses. And as I said, if anything is still off, please by all means tell me about it. Thanks.

Solution

The "Cons" value constructor has a single type parameter.

Nope: you've already parametrized it when you declared data List a. One effective property of this is that if I have a Nil :: List Int, I can't interchange it with a Nil :: List Char.

If you use the "Cons" value constructor, there are 2 fields which must be provided. The first required field is an instance of List. The second required field is an instance of a.

You've got it swapped: the first required field is an instance of a, the second field is an instance of List.

This chapter of Real World Haskell may be of interest.

Thanks. That is the chapter I'm on right now. So ... when the code says "Cons a (List a)", I thought the "Cons a" part of that was stating that the Cons value constructor was parameterised. They haven't yet covered the syntax for parameterised types, so I guessed that the syntax must require re-stating "a" if you intend to use a. But you're saying that's not necessary? And therefore that's not what that "a" means?

Nope. Once we declare a parameter in our type, we get to reuse it otherwise to say "that type should be used there." It's a little bit like an a -> b -> a type signature: a is parameterizing the type, but then I have to use the same a as the return value.

OK, but this is confusing. It seems that the first "a" means "the first field is an instance of a",

Nope, that is not true. It just means that the data type parametrizes over some type a.

and it ALSO means "the first field has the same value as the value they passed in for a". In other words, it specifies type AND value.

No, that is also not true.

Here's an instructive example, the syntax of which you may or may not have seen before:

foo :: Num a => a -> a

This is a fairly standard signature for a function that takes a number and does something to it and gives you another number. What I actually mean by "a number" in Haskell-speak, though, is some arbitrary type "a" that implements the "Num" class.

Thus, this parses to the English:

Let a indicate a type implementing the Num typeclass, then the signature of this method is one parameter with the type a, and the return value of the type a

Something similar happens with data.

It also occurs to me that the instance of List in the specification of Cons is also confusing you: be really careful when parsing that: whereas Cons is specifying a constructor, which is basically a pattern that Haskell is going to wrap the data into, (List a) looks like a constructor but is actually simply a type, like Int or Double. a is a type, NOT a value in any sense of the term.

Edit: In response to the most recent edit.

I think a dissection is first called for. Then I'll deal with your questions point by point.

Haskell data constructors are a little weird, because you define the constructor signature, and you don't have to make any other scaffolding. Datatypes in Haskell don't have any notion of member variable. (Note: there's an alternate syntax that this way of thinking is more amenable to, but let's ignore that for now).

Another thing is that Haskell code is dense; its type signature are like that to. So expect to see the same symbol reused in different contexts. Type inferencing also plays a big role here.

So, back to your type:

data List a = Cons a (List a)
              | Nil

I chunk this into several pieces:

data List a

This defines the name of the type, and any parameterized types that it will have later on. Note that you will only see this show up in other type signatures.

Cons a (List a) |
Nil

This is the name of the data constructor. This IS NOT a type. We can, however, pattern match for it, ala:

foo :: List a -> Bool
foo Nil = True

Notice how List a is the type in the signature, and Nil is both the data constructor and the "thing" we pattern match for.

Cons a (List a)

These are the types of the values we slot into the constructor. Cons has two entries, one is of type a, and one is of type List a.

So my question is, why does "a" in the first field mean "the type of this field is 'a' and the value of this field is 'a'", while "a" in the second field means only "the type of this field is 'List of a'"?

Simple: don't think of it as us specifying the type; think of it has Haskell is inferencing the type out of it. So, for our intents and purposes, we're simply sticking a 0 in there, and a Nil in the second section. Then, Haskell looks at our code and thinks:

Hmm, I wonder what the type of Cons 0 Nil is
Well, Cons is a data constructor for List a. I wonder what the type of List a is
Well, a is used in the first parameter, so since the first parameter is an Int (another simplification; 0 is actually a weird thing that is typeclassed as Num), so that's mean a is a Num
Hey, well, that also means that the type of Nil is List Int, even though there's nothing there that would actually say that

(Note, that's not actually how it's implemented. Haskell can do a lot of weird things while inferencing types, which is partially why the error messages suck.)

OTHER TIPS

Analogies are usually lacking in all sorts of ways, but since you know C# I thought this might be helpful.

This is how I would describe the List a definition in C#, maybe this clears some things up (or more likely, confuses you even more).

class List<A>
{
}

class Nil<A> : List<A>
{
    public Nil() {}
}

class Cons<A> : List<A>
{
    public A Head;
    public List<A> Tail;

    public Cons(A head, List<A> tail)
    {
        this.Head = head;
        this.Tail = tail;
    }
}

As you can see;

the List type has a single type parameter (<A>),
the Nil constructor doesn't have any parameters,
and the Cons constructor has two parameters, a value head of type A and a value tail of type List<A>.

Now, in Haskell the Nil and Cons are just constructors for the List a data type, in C# they are also types in and of themselves, so that's where the analogy fails.

But I hope this gives you some intuitive sense of what the different A's represent.

(And please do comment on how this horrible comparison doesn't do justice to Haskell's data types.)

Cons a (List a)

The first field of a Cons is a value of type "a". The second is a value of type "List a", i.e. a List parameterized with the same type as the parameter of the current list.

5 is wrong, and I'd say 6 as follows to replace both:

Cons{1} a{2} (List a){3} is a constructor called Cons (part before {1}) for a value of type List a (the data List a part) that needs two values: one of type a (the part between {1} and {2}) and one of type List a(the part between {2} and {3}).

To help you with an apparent source of confusion: in Haskell you hardly ever have to give explicit type parameters - type inference infers the types from your values. So in a sense, yes, when you pass in a value to a function or constructor, you also specify a type, viz. the type of the passed-in value.

Yeah the data syntax is a little confusing, because it puns names and types and doesn't really make a syntactic distinction between them. In particular, in a constructor definition like:

Cons a (List a)

The first word is the name of the constructor; every other word is the name of some predeclared type. So both a and List a are already in scope (the a was brought in to scope by the a in "data List a"), and you're saying that those are the types of the parameters. Their role might be better demonstrated by stating the same thing using record syntax:

Cons { headL :: a, tailL :: List a }

I.e. a value of type List Int, if it was constructed with the Cons constructor, has two fields: an Int and a List Int. If it was constructed with Nil, it has no fields.

When I type "Cons 0 Nil", it uses the "Cons" value constructor to create an instance of List. From 0, it learns that the type parameter is "Integer". So far, no confusion.

However, it also determines that the value of the first field of the Cons is 0. Yet it determines nothing about the value of the second field ... it only determines that the second field has a type of "List Integer".

No, it determines that the second field's value is Nil. Given your definition, Nil is a value of the type List a. Therefore so is Cons 0 Nil. And in Cons 1 it the second field's value is it; i.e., Cons 0 Nil. This is precisely what the REPL shows: Cons 1 (Cons 0 Nil).

I looked at your edited question.

When I create instances using the Cons value constructor, those instances "interpret" the first 'a' as meaning "place the value passed in here.

In "Cons a (List a)", both "a" and "List a" are types. I don't understand what "value" has to with it.

When I type "Cons 0 Nil", it uses the "Cons" value constructor to create an instance of List. From 0, it learns that the type parameter is "Integer". So far, no confusion.

However, it also determines that the value of the first field of the Cons is 0. Yet it determines nothing about the value of the second field ... it only determines that the second field has a type of "List Integer".

The value of the second field is Nil.

So my question is, why does "a" in the first field mean "the type of this field is 'a' and the value of this field is 'a'", while "a" in the second field means only "the type of this field is 'List of a'"?

"a" in the first field means "the type of this field is 'a'". "List a" in the second field means "the type of this field is 'List a'". In the case of "Cons 0 Nil" above, 'a' is inferred to be "Integer". So "Cons a (List a)" becomes "Cons Integer (List Integer)". The 0 is a value of type Integer. The Nil is a value of type "List Integer".

the value of this field is 'a'

I don't understand what you mean by this. 'a' is a type variable; what does it have to do with values?

Just to give you some extra "help", in case you are still watching this thread. Haskell has a couple of conventions that mess up others' ideas of how things ought to be done - in Haskell, a parameterized type is so commonly accepted, that it is usually thought of as being a type-level function. Likewise, the value constructors are thought of as "special" functions, that also allow pattern matching, in addition to their "take a value (or more) and produce a value as a result".

Another "funny" characteristic of Haskell is that it does not explicitly (or implicitly) evaluate the arguments to a function, even if that argument is in parentheses. Let me state that slightly differently: Haskell functions do not evaluate arguments in parentheses before other arguments. Arguments are put in parentheses for grouping purposes only, not to have them evaluated "first". Haskell assigns arguments to ("applies") a function at a higher precedence than any other operation - even higher than an implicit function application of one of its own arguments. This is why the Cons constructor has parens around the second argument, (List a) - to tell the compiler that Cons has two arguments, and not three. The parentheses are for grouping only, and not for precedence!

As somewhat of a side topic, be careful with the types in F#. Since F# has its roots in ML, its parameterized types have the parameters in front - int list, not (List Int) in back! Haskell does it the other way, because it's the same way that Haskell does functions - first the function, then arguments to the function. This encourages a common use pattern, and explains why Haskell types and value constructors are Capitalized - to remind you that you're dealing with a type/class-related thing.

Okay, I'm done; thanks for letting me put this huge Wall O' Text on your property...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow