Why do we use folds to encode datatypes as functions?

Question 1

The Wikipedia page on Scott encoding has some useful insights. The short version is, what you're referring to is the Church encoding, and your "hypothetical pattern-match encoding" is the Scott encoding. Both are sensible ways of doing things, but the Church encoding requires lighter type machinery to use (in particular, it does not require recursive types).

The proof that the two are equivalent uses the following idea:

churchfold :: (a -> b -> b) -> b -> [a] -> b
churchfold _ z [] = z
churchfold f z (x:xs) = f x (churchfold f z xs)

scottfold :: (a -> [a] -> b) -> b -> [a] -> b
scottfold _ z [] = z
scottfold f _ (x:xs) = f x xs

scottFromChurch :: (a -> [a] -> b) -> b -> [a] -> b
scottFromChurch f z xs = fst (churchfold g (z, []) xs)
 where
  g x ~(_, xs) = (f x xs, x : xs)

The idea is that since churchfold (:) [] is the identity on lists, we can use a Church fold that produces the list argument it is given as well as the result it is supposed to produce. Then in the chain x1 `f` (x2 `f` (... `f` xn) ... ) the outermost f receives a pair (y, x2 : ... : xn : []) (for some y we don't care about), so returns f x1 (x2 : ... : xn : []). Of course, it also has to return x1 : ... : xn : [] so that any more applications of f could also work.

(This is actually a little similar to the proof of the mathematical principle of strong (or complete) induction, from the "weak" or usual principle of induction).

By the way, your Bool r type is a bit too big for real Church booleans – e.g. (+) :: Bool Integer, but (+) isn't really a Church boolean. If you enable RankNTypes then you can use a more precise type: type Bool = forall r. r -> r -> r. Now it is forced to be polymorphic, so genuinely only contains two (ignoring seq and bottom) inhabitants – \t _ -> t and \_ f -> f. Similar ideas apply to your other Church types, too.

Question 2

Given some inductive data type

data Nat = Succ Nat | Zero

we can consider how we pattern match on this data

case n of
  Succ n' -> f n'
  Zero    -> g

it should be obvious that every function of type Nat -> a can be defined by giving an appropriate f and g and that the only ways to make a Nat (baring bottom) is using one of the two constructors.

EDIT: Think about f for a moment. If we are defining a function foo :: Nat -> a by giving the appropriate f and g such that f recursively calls foo than we can redefine f as f' n' (foo n') such that f' is not recursive. If the type a = (a',Nat) than we can instead write f' (foo n). So, without loss of generality

foo n = h $ case n
                 Succ n' -> f (foo n)
                 Zero    -> g

this is the formulation that makes the rest of my post make sense:

So, we can thus think about the case statement as applying a "destructor dictionary"

data NatDict a = NatDict {
   onSucc :: a -> a,
   onZero :: a
}

now our case statement from before can become

h $ case n of
      Succ n' -> onSucc (NatDict f g) n'
      Zero    -> onZero (NatDict f g)

given this we can derive

newtype NatBB = NatBB {cataNat :: forall a. NatDict a -> a}

we can then define two functions

fromBB :: NatBB -> Nat
fromBB n = cataNat n (NatDict Succ Zero)

and

toBB :: Nat -> NatBB
toBB Zero = Nat $ \dict -> onZero dict
toBB (Succ n) = Nat $ \dict -> onSucc dict (cataNat (toBB n) dict)

we can prove these two functions are witness to an isomorphism (up to fast and lose reasoning) and thus show that

newtype NatAsFold = NatByFold (forall a. (a -> a) -> a -> a)

(which is just the same as NatBB) is isomorphic to Nat

We can use the same construction with other types, and prove that the resulting function types are what we want just by proving that the underlying types are isomorphic with algebraic reasoning (and induction).

As to your second question, Haskell's type system is based on iso-recursive not equi-recursive types. This is probably becuase the theory and type inference is easier to work out with iso-recursive types, and they have all the power they just impose a little more work on the programmers part. I like to claim that you can get your iso-recursive types without any overhead

newtype RecListType a r = RecListType (r -> (a -> RecListType -> r) -> r)

but apparently GHCs optimizer chokes on those sometimes :(.