What does it mean for a language to be statically typed?

https://stackoverflow.com/questions/3623323

26-09-2019
|

문제

My understanding is that it means that one can potentially write a program to formally prove that a program written in a statically typed language will be free of a certain (small) subset of defects.

My problem with this is as follows:

Assume that we have two turing complete languages, A and B. A is presumed to be 'type safe' and 'B' is presumed not to be. Suppose I am given a program L to check the correctness of any program written in A. What is to stop me from translating any program written in B to A, applying L. If P translates from A to B then why isn't PL a valid type checker for any program written in B?

I'm trained in Algebra and am only just starting to study CS so there might be some obvious reason that this doesn't work but I would very much like to know. This whole 'type safety' thing has smelt fishy to me for a while.

해결책

Let A be your Turing-complete language which is supposed to be statically typed and let A' be the language you get from A when you remove the type checking (but not the type annotations because they also serve other purposes). The accepted programs of A will be a subset of the accepted programs of A'. So in particular, A' will also be Turing-complete.

Given your translator P from B to A (and vice versa). What is it supposed to do? It could do one of two things:

Firstly, it could translate every program y of B to a program of A. In this case, LPy would always return True as programs of A are by definition correctly typed.
Secondly, P could translate every program y of B to a program of A'. In this case, LPy would return True if Py happens to be a program of A and False if not.

As the first case doesn't yield anything interesting, let us stick to the second case, which is probably what you mean. Does the function LP defined on programs of B tell us anything interesting about programs of B? I say no, because it is not invariant under a change of P. As A is Turing-complete, even in the second case P could be chosen so that its image happens to lie in A. Then LP would be constantly True. On the other hand, P could be chosen so that some programs are mapped to the complement of A in A'. In this case LP would spit out False for some (possibly all) programs of B. As you can see, you don't get anything which only depends on y.

I can also put it more mathematically in the following way: There is a category C of programming languages whose objects are the programming languages and whose morphisms are translators from one programming language to another one. In particular if there is a morphism P: X -> Y, Y is at least as expressive as X. Between each pair of Turing-complete languages there are morphisms in both directions. For each object X of C (i.e. for each programming language) we have an associated set, say {X} (bad notation, I know) of those partially defined functions that can be computed by programs of X. Each morphism P: X -> Y then induces an inclusion {X} -> {Y} of sets. Let us formally invert all those morphisms P: X -> Y that induce the identity {X} -> {Y}. I will call the resulting category (which is, in mathematical terms, a localization of C) by C'. Now the inclusion A -> A' is a morphism in C'. However, it is not preserved under automorphisms of A', that is the morphism A -> A' is not an invariant of A' in C'. In other words: from this abstract point of view the attribute "statically typed" is not definable and can be arbitrarily attached to a language.

To make my point clearer you can also think of C' as the category of, say, geometrical shapes in three-dimensional space together with the Euclidean motions as morphisms. A' and B are then two geometrical shapes and P and Q are Euclidean motions bringing B to A' and vice versa. For example, A' and B could be two spheres. Now let us fix a point on A', which shall stand for the subset A of A'. Let us call this point "statically typed". We want to know whether a point of B is statically typed. So we take such a point y, map it via P to A' and test, whether it is our marked point on A'. As one can easily see, this depends on the chosen map P or, to put in other words: A marked point on a sphere is not preserved by automorphisms (that are Euclidean motions that map the sphere onto itself) of that sphere.

다른 팁

If you can translate every B' (a program written in B) into an equivalent A' (which is correct if B' is), then language B enjoys just as much "type-safety" as language A (in a theoretical sense, of course;-) -- basically this would mean that B is such that you can do perfect type inferencing. But that's extremely limited for a dynamic language -- e.g., consider:

if userinput() = 'bah':
    thefun(23)
else:
    thefun('gotcha')

where thefun (let's assume) is typesafe for int argument, but not for str argument. Now -- how do you translate this to language A in the first place...?

Another way to make the same point as has been made is that your question constitutes a proof by contradiction that either:

A cannot be mapped to B
type safety is not a lexical property of a language

or both. My intuition says that the latter is probably the sticking point: that type-safety is a meta-linguistic property.

There's nothing "fishy" about it. ;)

The set of Turing-complete languages which are type-safe with respect to any nontrivial [1] type system T is a strict subset of the Turing-complete languages. As such, in the general case, no translator P^-1 from B to A exists; therefore, neither does any translator-cum-type-checker LP^-1.

A knee-jerk reaction to this sort of claim might be: Nonsense! Both A and B are Turing-complete, so I can express any computable function in either language! And, indeed, this is correct--you can express any computable function in either language; however, quite often, you can also express quite a bit more. In particular you can construct expressions whose denotational semantics are not well-defined, such as those which might happily try to take the arithmetic sum of the character strings "foo" and "bar" (this is the gist of ~~Chubsdad~~ Alex Martelli's answer). These sorts of expressions may be "in" the language B, but may simply not be expressible in the language A, because the denotational semantics are undefined, thus there is no sensible way to translate them.

This is one of the great strengths of static typing: If your type system is unable to prove, at compile time, that the aforementioned function will receive any parameters but those for which the outcome of the arithmetic addition operator is well-defined, it can be rejected as ill-typed.

Note that while the above is the usual sort of example given to explain the merits of a static type system, it is perhaps too modest. In general, a static type system need not be limited to merely enforcing type-correctness of parameters, but indeed can express any desired property of a program which can be proven at compile time. For example, it is possible to construct type systems which enforce the constraint that one release a filesystem handle (e.g. to a database, file, network socket, etc.) within the same scope in which it was acquired. Obviously, this is tremendously valuable in such domains as life-support systems, among others, where provable correctness of as many parameters of the system as possible is absolutely essential. If you satisfy the static type system, you can get these sorts of proofs for free.

[1] By nontrivial, I mean such that not all possible expressions are well-typed.

My understanding is that this has to do with compile-time vs. run-time. In a statically typed language the majority of type checking is performed during compile-time. In a dynamically typed language, the majority of its type checking is performed at run-time.

Let me answer this the other way round:

There are two different types of "dynamic" programming.

One is "dynamically typed", which means you have some sort of a shell where you can program by typing definitions into that shell, think of it like Python's IDLE shell.

The other type of dynamic programming, is a more theoretical one. A dynamic program, is one that can change its own source. It needs some level of introjection, and is often used to change program memory on microcontrollers. Sometimes generating lookup tables for number crunching is called dynamic programming.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow