Is my understanding of type systems correct?

https://stackoverflow.com/questions/2135066

22-09-2019
|

Question

The following statements represent my understanding of type systems (which suffers from too little hands-on experience outside the Java world); please correct any errors.

The static/dynamic distinction seems pretty clear-cut:

Statically typed langauges assign each variable, field and parameter a type and the compiler prevents assignments between incompatible types. Examples: C, Java, Pascal.
Dynamically typed languages treat variables as generic bins that can hold anything you want - types are checked (if at all) only at runtime when you actually perform operations on the values, not when you assign them. Examples: Smalltalk, Python, JavaScript.
Type inference allows statically typed languages to look like (and have some of the advantages of) dynamically typed ones, by inferring types from the context so that you don't have to declare them most of the time - but unlike in dynamic languages, you cannot e.g. use a variable to hold a string initially and then assign an integer to it. Examples: Haskell, Scala

I am much less certain about the strong/weak distinction, and I suspect that it's not very clearly defined:

Strongly typed languages assign each runtime value a type and only allow operations to be performed that are defined for that type, otherwise there is an explicit type error.
Weakly typed languages don't have runtime type checks - if you try to perform an operation on a value that it does not support, the results are unpredictable. It may actually do something useful, but more likely you'll get corrupted data, a crash, or some undecipherable secondary error.
There seems to be at least two different kinds of weakly typed languages (or perhaps a continuum):
- In C and assembler, values are basically buckets of bits, so anything is possible and if you get the compiler to dereference the first 4 bytes of a null-terminated string, you better hope it leads somewhere that does not contain legal machine code.
- PHP and JavaScript are also generally considered weakly typed, but do not consider values to be opaque bit buckets; they will, however, perform implicit type conversions.
But these implicit conversions seem to apply mainly to string/integer/float variables - does that really warrant the classification as weakly typed? Or are there other issues where these languages's type system may obfuscate errors?

Solution

I am much less certain about the strong/weak distinction, and I suspect that it's not very clearly defined.

You are right: it isn't.

This is what Benjamin C. Pierce, author of Types and Programming Languages and Advanced Types and Programming Languages has to say:

I spent a few weeks... trying to sort out the terminology of "strongly typed," "statically typed," "safe," etc., and found it amazingly difficult.... The usage of these terms is so various as to render them almost useless.

Luca Cardelli, in his Typeful Programming article, defines it as the absence of unchecked run-time type errors. Tony Hoare calls that exact same property "security". Other papers call it "type safety" or simply "safety".

Mark-Jason Dominus wrote a classic rant about this a couple of years ago on the comp.lang.perl.moderated newsgroup, in a discussion about whether or not Perl was strongly typed. In this rant he states that within just a few hours of research, he was able to find 8 different, sometimes contradictory definitions, mostly from respected sources like college textbooks or peer-reviewed papers. In particular, those texts contained examples that were meant to help the students distinguish between strongly and weakly typed languages, and according to those examples, C is strongly typed, C is weakly typed, C++ is strongly typed, C++ is weakly typed, Lisp is strongly typed, Lisp is weakly typed, Perl is strongly typed, Perl is weakly typed. (Does that clear up any confusion?)

The only definition that I have seen consistently applied is:

strongly typed: my programming language
weakly typed: your programming language

OTHER TIPS

Regarding static and dynamic typing you are dead on the money. Static typing means that programs are checked before being executed, and a program might be rejected before it starts. Dynamic typing means that the types of values are checked during execution, and a poorly typed operation might cause the program to halt or otherwise signal an error at run time. A primary reason for static typing is to rule out programs that might have such "dynamic type errors".

Bob Harper has argued that a dynamically typed language can (and should) be considered to be a statically typed language with a single type, which Bob calls "value". This view is fair, but it's helpful only in limited contexts, such as trying to be precise about the type theory of languages.

Although I think you grasp the concept, your bullets do not make it clear that type inference is simply a special case of static typing. In most languages with type inference, type annotations are optional, but not necessarily in all contexts. (Example: signatures in ML.) Advanced static type systems often give you a tradeoff between annotations and inference; for example, in Haskell you can type polymorphic functions of higher rank (forall to the left of an arrow) but only with an annotations. So, if you are willing to add an annotation, you can get the compiler to accept a program that would be rejected without the annotation. I think this is the wave of the future in type inference.

The ideas of "strong" and "weak" typing I would characterize as not useful, because they don't have a universally agreed on technical meaning. Strong typing generally means that there are no loopholes in the type system, whereas weak typing means the type system can be subverted (invalidating any guarantees). The terms are often used incorrectly to mean static and dynamic typing. To see the difference, think of C: the language is type-checked at compile time (static typing), but there are plenty of loopholes; you can pretty much cast a value of any type to another type of the same size—in particular, you can cast pointer types freely. Pascal was a language that was intended to be strongly typed but famously had an unforeseen loophole: a variant record with no tag.

Implementations of strongly typed languages often acquire loopholes over time, usually so that part of the run-time system can be implemented in the high-level language. For example, Objective Caml has a function called Obj.magic which has the run-time effect of simply returning its argument, but at compile time it converts a value of any type to one of any other type. My favorite example is Modula-3, whose designers called their type-casting construct LOOPHOLE.

I encourage you to avoid the terms "strong" and "weak" with regard to type systems, and instead say precisely what you mean, e.g., "the type system guarantees that the following class of errors cannot occur at run time" (strong), "the static type system does not protect against certain run-time errors" (weak), or "the type system has a loophole" (weak). Just calling a type system "strong" or "weak" by itself does not communicate very much.

This is a pretty accurate reflection of my own understanding of the topic of the static/dynamic, strong/weak typing discussion. In addition, you can consider those other languages:

In languages such as TCL and Bourne Shell, the "main" value type is the string. Numeric operators are available that implicitly coerce input values from string representation and result values to string representation. They can be considered examples of dynamic, weakly typed languages.

Forth may be an example of a static, weakly typed language. The language performs no type checking of its own, and the main stack may interchangeably contain pointers, integers, strings (conventionally represented as two cells, start and length). Inconsistent use of operators can lead to either interesting, or unspecified behavior. Typical Forth implementations provide a separate stack for floating point numbers.

Maybe this Book can help. Be prepared for some math though. If I remember correctly, a "non-math" statement was: "Strongly typed: A language that I feel safe to program with".

There seems to be at least two different kinds of weakly typed languages (or perhaps a continuum):

In C and assembler, values are basically buckets of bits, so anything is possible and if you get the compiler to dereference the first 4 bytes of a null-terminated string, you better hope it leads somewhere that does not contain legal machine code.

I would disagree with this statement, at least in C. You can manipulate the type system in C in such a way that you can treat any given memory location as a bucket of bits, but a variable most definitely has a type and that type has specific properties. The fact that there are no runtime checks (unless you consider floating point exceptions or segmentation faults to be runtime checks) isn't really relevant. C can be considered "weakly typed" in the sense that the compiler will perform some implicit type conversion for you, but it doesn't go very far with it.

I consider strong/weak to be the concept of implicit conversion and a good example is addition of a string and a number. In a strongly typed language the conversion won't happen (at least in all languages I can think of) and you'll get an error. Weakly typed languages like VB (with Option Explicit Off) and Javascript will try to cast one of the operands to the other type.

In VB.Net with Option Strict Off:

    Dim A As String = "5"
    Dim B As Integer = 5
    Trace.WriteLine(A + B) 'returns 10

With Option Strict On (turning VB into a strongly typed language) you'll get a compiler error.

In Javascript:

    var A = '5';
    var B = 5;
    alert(A + B);//returns 55

Some people will say that the results are not predictable but they actually do follow a set of rules.

Hmm, don't know much more either, but I wanted to mention C++ and its implicit converstions(implicit constructors). This might be as well an example of weak typing.

I agree with the others who say "there doesn't seem to be a hard and fast definition here." My answer tends to be based on how much rope the language gives you WRT types. If you can pretty much fake anything you want, then it's weak. If it really doesn't let you get yourself into trouble, even if you want to, it's strong.

I really haven't seen too many languages that skirt this border, so I can't say that I've ever needed a better definition that that...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow