What is () in Haskell, exactly?

Question 1

tl;dr () does not add a "null" value to every type, hell no; () is a "dull" value in a type of its own: ().

Let me step back from the question a moment and address a common source of confusion. A key thing to absorb when learning Haskell is the distinction between its expression language and its type language. You're probably aware that the two are kept separate. But that allows the same symbol to be used in both, and that is what is going on here. There are simple textual cues to tell you which language you're looking at. You don't need to parse the whole language to detect these cues.

The top level of a Haskell module lives, by default, in the expression language. You define functions by writing equations between expressions. But when you see foo :: bar in the expression language, it means that foo is an expression and bar is its type. So when you read () :: (), you're seeing a statement which relates the () in the expression language with the () in the type language. The two () symbols mean different things, because they are not in the same language. This replication often causes confusion for beginners, until the expression/type language separation installs itself in their subconscious, at which point it becomes helpfully mnemonic.

The keyword data introduces a new datatype declaration, involving a careful mixture of the expression and type languages, as it says first what the new type is, and secondly what its values are.

data TyCon tyvar ... tyvar = ValCon1 type ... type |  ...  | ValConn type ... type

In such a declaration, type constructor TyCon is being added to the type language and the ValCon value constructors are being added to the expression language (and its pattern sublanguage). In a data declaration, the things which stand in argument places for the ValCons tell you the types given to the arguments when that ValCon is used in expressions. For example,

data Tree a = Leaf | Node (Tree a) a (Tree a)

declares a type constructor Tree for binary tree types storing a elements at nodes, whose values are given by value constructors Leaf and Node. I like to colour type constructors (Tree) blue and value constructors (Leaf, Node) red. There should be no blue in expressions and (unless you're using advanced features) no red in types. The built-in type Bool could be declared,

data Bool = True | False

adding blue Bool to the type language, and red True and False to the expression language. Sadly, my markdown-fu is inadequate to the task of adding the colours to this post, so you'll just have to learn to add the colours in your head.

The "unit" type uses () as a special symbol, but it works as if declared

data () = ()  -- the left () is blue; the right () is red

meaning that a notionally blue () is a type constructor in the type language, but that a notionally red () is a value constructor in the expression language, and indeed () :: (). [ It is not the only example of such a pun. The types of larger tuples follow the same pattern: pair syntax is as if given by

data (a, b) = (a, b)

adding (,) to both type and expression languages. But I digress.]

So the type (), often pronounced "Unit", is a type containing one value worth speaking of: that value is also written () but in the expression language, and is sometimes pronounced "void". A type with only one value is not very interesting. A value of type () contributes zero bits of information: you already know what it must be. So, while there is nothing special about type () to indicate side effects, it often shows up as the value component in a monadic type. Monadic operations tend to have types which look like

val-in-type-1 -> ... -> val-in-type-n -> effect-monad val-out-type

where the return type is a type application: the (type) function tells you which effects are possible and the (type) argument tells you what sort of value is produced by the operation. For example

put :: s -> State s ()

which is read (because application associates to the left ["as we all did in the sixties", Roger Hindley]) as

put :: s -> (State s) ()

has one value input type s, the effect-monad State s, and the value output type (). When you see () as a value output type, that just means "this operation is used only for its effect; the value delivered is uninteresting". Similarly

putStr :: String -> IO ()

delivers a string to stdout but does not return anything exciting.

The () type is also useful as an element type for container-like structures, where it indicates that the data consists just of a shape, with no interesting payload. For example, if Tree is declared as above, then Tree () is the type of binary tree shapes, storing nothing of interest at nodes. Similarly [()] is the type of lists of dull elements, and if there is nothing of interest in a list's elements, then the only information it contributes is its length.

To sum up, () is a type. Its one value, (), happens to have the same name, but that's ok because the type and expression languages are separate. It's useful to have a type representing "no information" because, in context (e.g., of a monad or a container), it tells you that only the context is interesting.

Question 2

The () type can be thought of as a zero-element tuple. It's a type that can only have one value, and thus it's used where you need to have a type, but you don't actually need to convey any information. Here's a couple of uses for this.

Monadic things like IO and State have a return value, as well as performing side-effects. Sometimes the only point of the operation is to perform a side-effect, like writing to the screen or storing some state. For writing to the screen, putStrLn must have type String -> IO ? -- IO always has to have some return type, but here there's nothing useful to return. So what type should we return? We could say Int, and always return 0, but that's misleading. So we return (), the type that has only one value (and thus no useful information), to indicate that there's nothing useful coming back.

It's sometimes useful to have a type which can have no useful values. Consider if you'd implemented a type Map k v which maps keys of type k to values of type v. Then you want to implement a Set, which is really similar to a map except that you don't need the value part, just the keys. In a language like Java you might use booleans as the dummy value type, but really you just want a type that has no useful values. So you could say type Set k = Map k ()

It should be noted that () is not particularly magic. If you want you can store it in a variable and do a pattern match on it (although there's not much point):

main = do
  x <- putStrLn "Hello"
  case x of
    () -> putStrLn "The only value..."

Question 3

It is called the Unit type, usually used to represent side effects. You can think of it vaguely as Void in Java. Read more here and here etc. What can be confusing is that () syntactically represents both the type and its only value literal. Also note that it is not similar to null in Java which means an undefined reference - () is just effectively a 0-sized tuple.

Question 4

I really like to think of () by analogy with tuples.

(Int, Char) is the type of all pairs of an Int and a Char, so it's values are all possible values of Int crossed with all possible values of Char. (Int, Char, String) is similarly the type of all triples of an Int, a Char, and a String.

It's easy to see how to keep extending this pattern upwards, but what about downwards?

(Int) would be the "1-tuple" type, consisting of all possible values of Int. But that would be parsed by Haskell as just putting parentheses around Int, and thus being just the type Int. And values in this type would be (1), (2), (3), etc, which also would just get parsed as ordinary Int values in parentheses. But if you think about it, a "1-tuple" is exactly the same as just a single value, so there's no need to actually have them exist.

Going down one step further to zero-tuples gives us (), which should be all possible combinations of values in an empty list of types. Well, there's exactly one way to do that, which is to contain no other values, so there should be only one value in the type (). And by analogy with tuple value syntax, we can write that value as (), which certainly looks like a tuple containing no values.

That's exactly how it works. There is no magic, and this type () and its value () are in no way treated specially by the language.

() is not in fact being treated as "a null value for any type" in the monads examples in the LYAH book. Whenever the type () is used the only value which can be returned is (). So it's used as a type to explicitly say that there cannot be any other return value. And likewise where another type is supposed to be returned, you cannot return ().

The thing to keep in mind is that when a bunch of monadic computations are composed together with do blocks or operators like >>=, >>, etc, they'll be building a value of type m a for some monad m. That choice of m has to stay the same throughout the component parts (there's no way to compose a Maybe Int with an IO Int in that way), but the a can and very often is different at each stage.

So when someone sticks an IO () in the middle of an IO String computation, that's not using the () as a null in the String type, it's simply using an IO () on the way to building an IO String, the same way you could use an Int on the way to building a String.

Question 5

Yet another angle:

() is the name of a set which contains a single element called ().

Its indeed slightly confusing that the name of the set and the element in it happens to be the same in this case.

Remember: in Haskell a type is a set that has its possible values as elements in it.

Question 6

The confusion comes from other programming languages: "void" means in most imperative languages that there is no structure in memory storing a value. It seems inconsistent because "boolean" has 2 values instead of 2 bits, while "void" has no bits instead of no values, but there it is about what a function returns in a practical sense. To be exact: its single value consumes no bit of storage.

Let's ignore the value bottom (written _|_) for a moment...

() is called Unit, written like a null-tuple. It has only one value. And it is not called Void, because Void has not even any value, thus could not be returned by any function.

Observe this: Bool has 2 values (True and False), () has one value (()), and Void has no value (it doesn't exist). They are like sets with two/one/no elements. The least memory they need to store their value is 1 bit / no bit / impossible, respectively. Which means that a function that returns a () may return with a result value (the obvious one) that may be useless to you. Void on the other hand would imply that that function will never return and never give you any result, because there would not exist any result.

If you want to give "that value" a name, that a function returns which never returns (yes, this sounds like crazytalk), then call it bottom ("_|_", written like a reversed T). It could represent an exception or infinity loop or deadlock or "just wait longer". (Some functions will only then return bottom, iff one of their parameters is bottom.)

When you create the cartesian product / a tuple of these types, you will observe the same behaviour: (Bool,Bool,Bool,(),()) has 2·2·2·1·1=6 differnt values. (Bool,Bool,Bool,(),Void) is like the set {t,f}×{t,f}×{t,f}×{u}×{} which has 2·2·2·1·0=0 elements, unless you count _|_ as a value.