Pattern matching on List[T] and Set[T] in Scala vs. Haskell: effects of type erasure

Question 1

You must understand that by saying o:Any you erase all the specific information about the type and further on the type Any is all that the compiler knows about value o. That's why from that point on you can only rely on the runtime information about the type.

The case-expressions like case o:List[Int] are resolved using the JVM's special instanceof runtime mechanism. However the buggy behaviour you experience is caused by this mechanism only taking the first-rank type into account (the List in List[Int]) and ignoring the parameters (the Int in List[Int]). That's why it treats List[Int] as equal to List[String]. This issue is known as "Generics Erasure".

Haskell on the other hand performs a complete type erasure, which is well explained in the answer by Ben.

So the problem in both languages is the same: we need to provide a runtime information about the type and its parameters.

In Scala you can achieve that using the "reflection" library, which resolves that information implicitly:

import reflect.runtime.{universe => ru}
def whatIsIt[T](o : T)(implicit t : ru.TypeTag[T]) = 
  if( t.tpe <:< ru.typeOf[List[Int]] ) 
    "List[Int]"
  else if ( t.tpe <:< ru.typeOf[List[String]] ) 
    "List[String]"
  else if ( t.tpe <:< ru.typeOf[Set[Int]] ) 
    "Set[Int]"
  else if ( t.tpe <:< ru.typeOf[Set[String]] ) 
    "Set[String]"
  else sys.error("Unexpected type")  

println(whatIsIt(List("1","2","3")))
println(whatIsIt(Set("1","2","3")))

Output:

List[String]
Set[String]

Haskell has a very different approach to polymorphism. Above all, it does not have subtype polymorphism (it's not a weakness though), that's why the type-switching pattern matches as in your example are simply irrelevant. However it is possible to translate the Scala solution from above into Haskell quite closely:

{-# LANGUAGE MultiWayIf, ScopedTypeVariables #-}
import Data.Dynamic
import Data.Set

whatIsIt :: Dynamic -> String
whatIsIt a = 
  if | Just (_ :: [Int]) <- fromDynamic a -> "[Int]"
     | Just (_ :: [String]) <- fromDynamic a -> "[String]"
     | Just (_ :: Set Int) <- fromDynamic a -> "Set Int"
     | Just (_ :: Set String) <- fromDynamic a -> "Set String"
     | otherwise -> error "Unexpected type"

main = do
  putStrLn $ whatIsIt $ toDyn ([1, 2, 3] :: [Int])
  putStrLn $ whatIsIt $ toDyn (["1", "2", "3"] :: [String])
  putStrLn $ whatIsIt $ toDyn (Data.Set.fromList ["1", "2", "3"] :: Set String)

Output:

[Int]
[String]
Set String

However I must outline boldly that this is far from a typical scenario of Haskell programming. The language's type-system is powerful enough to solve extremely intricate problems while maintaining all the type-level information (and safety). Dynamic is only used in very special cases in low-level libraries.

Question 2

GHC does even more type erasure than the JVM; at runtime the types are completely gone (not just the type parameters).

Haskell's approach to types is to use them at compile time to guarantee that no ill-typed operation can ever be carried out, and since Haskell doesn't have OO-style subtyping and dynamic dispatch, there's no purpose at all to keeping the types around. So data is compiled to a memory structure that simply contains the right values, and functions are compiled with baked-in knowledge of the structure of the types on which they operate¹, and just blindly expect their arguments to have that structure. That's why you get fun things like segmentation faults if you mess with unsafeCoerce incorrectly, not just a runtime exception saying the value was not of the expected type; at runtime Haskell has no idea whether a value is of any given type.

So rather than Haskell giving "the right answer" to the equivalent program, Haskell disallows your program as unsafe! There is no Any type in Haskell to which you can cast whatever you want.

That's not 100% true; in both Haskell and Scala there are ways of keeping type information alive at runtime. Essentially it's done by creating ordinary data structures that represent types, and passing them around together values that are of those types, so at runtime you can refer to the type representation object for information about the type of the other object. There are library and language facilities in both languages to let you use this mechanism at a higher (and more principled) level, so that it's easier to use safely. Because it requires the type tokens to be passed around, you have to "opt-in" to such features, and your callers have to be aware of it to pass you the required type tokens (whether the actual generation and passing of the token is done implicitly or explicitly).

Without using such features, Haskell provides no way to pattern match on a value that could be of type List Int or Set String to find out which one it is. Either you're using a monomorphic type, in which case it can only be one type and the others will be rejected, or you're using a polymorphic type, in which case you can only apply code to it that will do the same thing² regardless of which concrete type instantiates the polymorphic type.

¹ Except for polymorphic functions, which assume nothing about their polymorphic arguments, and so can basically do nothing with them except pass them to other polymorphic functions (with matching type class constraints, if any).

² Type class constrained polymorphic types are the only exception to this. Even then, if you've got a value a type that's a member of some type class, all you can do with it is pass it to other functions that accept values in any type that is a member of that type class. And if those functions are general functions defined outside of the type class in question, they'll be under the same restriction. It's only the type class methods themselves that can actually "do something different" for different types in the class, and that's because they are the union of a whole bunch of monomorphic definitions that operate on one particular type in the class. You can't write code that gets to take a polymorphic value, inspect it to see what it was instantiated with, and then decide what to do.

Question 3

Of course Haskell prints the right answer:

import Data.Set
import Data.Typeable

main = do
  let li=[1,2,3]
  let ls=["1","2","3"]
  let si=Data.Set.fromList[1,2,3]
  let ss=Data.Set.fromList["1","2","3"]
  print $ typeOf li
  print $ typeOf ls
  print $ typeOf si
  print $ typeOf ss

prints

[Integer]
[[Char]]
Set Integer
Set [Char]