Dependent types for structured data validation

https://stackoverflow.com/questions/7703611

08-02-2021
|

Question

First of all, I don’t really know what’s wrong with dependent types and why we don’t see them implemented in existing languages for practical programming, instead of inventing all kind of tricks (patterns!) to bypass the limitations of current type systems which at best has very simple and limited generalization.

But my question is about Dependent types for data not a program, how or can we use them for structured data validation? Meaning, like json or xml or any kind of structured data , is it possible to verify them efficiently using some dependent type system?

Edit:

I meant by dependent types it’s most wide definition “type that depends on a value”, and not necessary those theorem prover and CoC staff. I don’t know them and I don’t want to go that road, I don’t believe those the only nor ‘the ultimate’ way to get decent dependent types. In FP, coders write their most complex logic everyday in very elegant, constructive way with all simplicity and having no problem at all. I believe they will have their ultimate “elegant” dependent typing.

However, my question was about pure Data , unlike in code when a lot of checking can be just unnecessary, and can just hiding in program flow and logic, even dynamic typing can work fine that way. In data, it’s not the case when you want to check some document correctness and give clear error messages. In another hand data does not have the complexity problem when you have to deal with “functions” in very extreme dependent type system (of CoC family).

Solution

You would probably be interested in this paper : The Next 700 Data Description Languages (PDF), Kathleen Fisher, Yitzhak Mandelbaum and David Walker, 2006.

The primary goal of this paper is to begin to understand the family of ad hoc data processing languages. We do so, as Landin did, by developing a semantic framework for defining, comparing, and contrasting languages in our domain. This semantic framework revolves around the definition of a data description calculus (DDC^α). This calculus uses types from a dependent type theory to describe various forms of ad hoc data: base types to describe atomic pieces of data and type constructors to describe richer structures. We show how to give a denotational semantics to DDC^α by interpreting types as parsing functions that map external representations (bits) to data structures in a typed lambda calculus. More precisely, these parsers produce both internal representations of the external data and parse descriptors that pinpoint errors in the original source.

In short : yes, dependent types are necessary if you want to statically encode fine-grained invariants about your data. Being more expressive than algebraic data types and GADTs, they also allow to express them and related constructs (such as the combination of untagged union and tagged product), having the ability to be, in some kind, the assembly language of data description, even if the user-facing specification doesn't expose term dependencies directly.

Beware, however, that such a formal approach comes at the cost of a steeper learning curve and higher upfront complexity, even if in theory it pays back with easier, safer, better specifications, manipulations tools and such. The practitioners in the field will more often than not neglect all that type system beauty, and fall back on ill-specified alternatives. XML is losing to JSON, in other reasons because specifying schemas is boring and people don't see what advantages they bring. Yes, you can later specify the static structure of the adopted JSON API (and you may well need dependent types to do that, as complexity easily crept into such evolution-rather-than-design formats), but this is only of little use if nobody cares about it, use it, understand it and, more importantly, maintain it.

(On a secondary front regarding your trollish introduction : please go forward and play with ATS, Guru or Agda, they are meant for relatively practical programming. If you want to go the frankenstein route, there is also SHE; Coq is not designed to be "practical for software development" but has been known to be abused this way -- I would not advise it for dependently typed programming, but it's good for not-very-dependent programming plus accompanying correctness proofs -- and if you want to sell your soul there is also F* coming soon.)

OTHER TIPS

I believe the main problem with dependent types in a language is it makes type checking (let alone type inference) an undecidable problem. Which means that for some programs the compiler will be forced into an infinite loop just evaluating the types, before it even gets to compilation.

This isn't necessarily a deal breaker, if the behaviour of the type checker can be made simple and intuitive. Avoiding infinite loops at compile time could become just another part of programming for such a language, just as avoiding infinite loops at runtime is.

However, what that would effectively add is a "compile time layer" to your programming, doing full Turing-complete computation in a completely different mode to the normal program. I believe this can already be done in some lanuages; C++ template meta programming is Turing-complete, I believe, and Scala's type system can encode the SKI calculus according to a blog I read once.

But actually programming such systems is horrible. In a programming language with a good type system, the type system exists to make programming easier; it enforces invariants, alerts us when changes to one part of a program affect another, and provides documentation about how to use library code (one can often make quite a good guess from the name and type alone exactly what a Haskell function does). But when you're programming in a type system, you don't have that help, unless someone adds a meta-type system. Again, these actually exist for some languages (I believe Haskell and Scala both have a concept of type kinds), but they aren't designed to have the same relationship to type system meta programming as the regular type system has to regular programming.

Personally, I'm waiting to see if a language will be developed that tries to address all of these issues in a usable way. However I don't wonder at all why such things don't already exist; it's a bloody hard set of problems to solve!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow