Differences between HOAS and FOAS

Question 1

jozefg's answer explains what FOAS and HOAS is, so in this answer, I just try to answer the various smaller points from the question. Read jozefg's answer first, I guess.

What about the While constructor makes it HOAS?

Let's look at the second argument of the While constructor: While :: ... -> (FunC s -> FunC s) -> .... In the type of this field, FunC shows up to the left of an arrow. So if you use While in a FunC program, your program is not an abstract syntax tree in memory, but something more complicated. The intended meaning of FunC s -> FunC s is "a FunC s with a free variable of type s". I guess this is used for the body of a while loop, and the free variable contains the value that changes in each loop iteration.

Why are none of the other constructors HOAS?

They don't have the ... -> (FunC ... -> ...) -> ... pattern we saw with the While constructor above. So if a FunC value only uses the other constructors, its memory representation looks like an abstract syntax tree.

Again, I don't understand what makes the data defined in HOAS.hs HOAS while the data defined in FOASTyped is FOAS.

You can look at the FOAS version of the code in the paper to see how they change the type of While to avoid the HOAS pattern, and what else they need to change to make it work.

Is there some general way in which HOAS is more difficult to transform than FOAS?

A HOAS program is not a tree, so you cannot pattern match on it. For example, you cannot pattern match on While (\_ _ (LitB False)) ... because you cannot match on lambdas like this.

How does HOAS help with type safety compared to FOAS?

In a HOAS program, you use Haskell variables to represent FunC variables. The Haskell typechecker will check that you only use Haskell variables in the scope of a corresponding variable binding. (GHC tells you "Not in scope: foo'" otherwise). Because FunC variables are represented as Haskell variables, this check is also useful for the type safety of FunC. If you use a HOAS-encoded FunC variable out of scope, the Haskell typechecker will complain about the Haskell variable being out of scope.

Now in FOAS, if you use Haskell Strings as FunC variables, the Haskell type checker will never complain if you use the wrong string, because you can use whatever string you want as far as GHC is concerned. There are techniques for improving FOAS to make the Haskell typechecker check your embedded program, but they tend to require more work from the user of the embedded language.

What is a variable binder?

A variable binder is language construct that introduces new names that you can use in other parts of the program. For example, in Haskell, if I write let x = 14 in ... I introduce a new name x that I can use in the .... Other binders in Haskell include lambda expressions, pattern matching, and top-level definitions.

how does Haskell implement it?

I don't really get this question. For typechecking, GHC keeps track of what variables are in scope where and complains if you use variables at the wrong place. For compilation, GHC generates machine code that "knows" where the values denoted by the variables are, usually because a pointer to the value of the variable is stored in a processor register or the stack or the heap.

and what languages don't have variable binders?

Many small and specialized languages don't have variable binders.

For example, consider regular expressions. At least originally, they cannot bind variables. (Some regular expression engines use backreferences, which are a form of variables, though).
Another example is the "language" of URLs. A URL is made of various parts (the protocol, server name, path, parameters, ...) with rules about what you can and cannot write, so it is a language. But you cannot introduce a name in a URL that you can later use in the URL.

Many low-level languages don't have variable binders.

For example, x86 machine code contains just numbers, no names.

There are Turing-complete languages without variable binders.

For example, the SK calculus.

Question 2

In FOAS we represent variables with identifiers, so

 data STLC = Var String
           | Lam String STLC
           | Unit
           | STLC :*: STLC

 term = Lam "a" $
        Lam "b" $
        Var "a" :*: (Lam "a" $ Var "a")

We have explicit variables and now it's up to us to make sure that scoping and variable binding works properly. The extra work has it's rewards however since we can now inspect and pattern match across a lambda's body which is vital for most transformation.

HOAS is essentially where we use the host languages (Haskell's) implementation of variables instead of representing them in the AST.

For example, consider STLC

  data STLC = Unit
            | Lam (STLC -> STLC)
            | STLC :*: STLC

Notice how we use the Haskell function STLC -> STLC to represent the variable that's bound by a lambda. That means that we can write

  term = Lam $ \a ->
         Lam $ \b ->
         a :*: (Lam $ \a -> a)

and it works. In a normal AST, we'd have to make sure that we alpha-convert everything properly to ensure that we respect scoping properly. This same advantage applies to all things that bind variables (variable-binders): Let expressions, continuations, exception handlers, whatever.

This comes with a major disadvantage though, since Lam has a fully abstract function, we can't inspect the body of the function at all. This makes a lot of transformations well, painful since everything is wrapped up under the the Haskell binding.

Another benefit is that since we don't provide an explicit constructor for variables, all terms are guaranteed closed.

Usually this means we represent things with a combination of HOAS and FOAS.