Question

I've heard of (structural) induction. It allows you to build up finite structures from smaller ones and gives you proof principles for reasoning about such structures. The idea is clear enough.

But what about coinduction? How does it work? How can one say anything conclusive about an infinite structure?

There are (at least) two angles to address, namely, coinduction as a way of defining things and as a proof technique.

Regarding coinduction as a proof technique, what is the relationship between coinduction and bisimulation?

Was it helpful?

Solution

First, to dispel a possible cognitive dissonance: reasoning about infinite structures is not a problem, we do it all the time. As long as the structure is finitely describable, that's not a problem. Here are a few common types of infinite structures:

  • languages (sets of strings over some alphabet, which may be finite);
  • tree languages (sets of trees over some alphabet);
  • execution traces of a non-deterministic system;
  • real numbers;
  • sets of integers;
  • sets of functions from integers to integers; …

Coinductivity as the largest fixpoint

Where inductive definitions build a structure from elementary building blocks, coinductive definitions shape structures from how they can be deconstructed. For example, the type of lists whose elements are in a set A is defined as follows in Coq:

Inductive list (A:Set) : Set :=
  | nil : list A
  | cons : A -> list A -> list A.

Informally, the list type is the smallest type that contains all values built from the nil and cons constructors, with the axiom that $\forall x \, y, \: \mathtt{nil} \ne \mathtt{cons} \: x \: y$. Conversely, we can define the largest type that contains all values built from these constructors, keeping the discrimination axiom:

CoInductive colist (A:Set) : Set :=
  | conil : colist A
  | cocons : A -> colist A -> colist A.

list is isomorphic to a subset of colist. In addition, colist contains infinite lists: lists with cocons upon cocons.

CoFixpoint flipflop : colist ℕ := cocons 1 (cocons 2 flipflop).
CoFixpoint from (n:ℕ) : colist ℕ := cocons n (from (1 + n)).

flipflop is the infinite (circular list) $1::2::1::2::\ldots$; from 0 is the infinite list of natural numbers $0::1::2::\ldots$.

A recursive definition is well-formed if the result is built from smaller blocks: recursive calls must work on smaller inputs. A corecursive definition is well-formed if the result builds larger objects. Induction looks at constructors, coinduction looks at destructors. Note how the duality not only changes smaller to larger but also inputs to outputs. For example, the reason the flipflop and from definitions above are well-formed is that the corecursive call is guarded by a call to the cocons constructor in both cases.

Where statements about inductive objects have inductive proofs, statements about coinductive objects have coinductive proofs. For example, let's define the infinite predicate on colists; intuitively, the infinite colists are the ones that don't end with conil.

CoInductive Infinite A : colist A -> Prop :=
  | Inf : forall x l, Infinite l -> Infinite (cocons x l).

To prove that colists of the form from n are infinite, we can reason by coinduction. from n is equal to cocons n (from (1 + n)). This shows that from n is larger than from (1 + n), which is infinite by the coinduction hypothesis, hence from n is infinite.

Bisimilarity, a coinductive property

Coinduction as a proof technique also applies to finitary objects. Intuitively speaking, inductive proofs about an object are based on how the object is built. Coinductive proofs are based on how the object can be decomposed.

When studying deterministic systems, it is common to define equivalence through inductive rules: two systems are equivalent if you can get from one to the other by a series of transformations. Such definitions tend to fail to capture the many different ways non-deterministic systems can end up having the same (observable) behavior in spite of having different internal structure. (Coinduction is also useful to describe non-terminating systems, even when they're deterministic, but this isn't what I'll focus on here.)

Nondeterministic systems such as concurrent systems are often modeled by labeled transition systems. An LTS is a directed graph in which the edges are labeled. Each edge represents a possible transition of the system. A trace of an LTS is the sequence of edge labels over a path in the graph.

Two LTS can behave identically, in that they have the same possible traces, even if their internal structure is different. Graph isomorphism is too strong to define their equivalence. Instead, an LTS $\mathscr{A}$ is said to simulate another LTS $\mathscr{B}$ if every transition of the second LTS admits a corresponding transition in the first. Formally, let $S$ be the disjoint union of the states of the two LTS, $L$ the (common) set of labels and $\rightarrow$ the transition relation. The relation $R \subseteq S \times S$ is a simulation if $$ \forall (p,q)\in R, %\forall p'\in S, \forall\alpha\in L, \text{ if } p \stackrel\alpha\rightarrow p' \text{ then } \exists q', \; q \stackrel\alpha\rightarrow q' \text{ and } (p',q')\in R $$

$\mathscr{A}$ simulates $\mathscr{B}$ if there is a simulation in which all the states of $\mathscr{B}$ are related to a state in $\mathscr{A}$. If $R$ is a simulation in both directions, it is called a bisimulation. Simulation is a coinductive property: any observation on one side must have a match on the other side.

There are potentially many bisimulations in an LTS. Different bisimulations might identify different states. Given two bisimulations $R_1$ and $R_2$, the relation given by taking the union of the relation graphs $R_1 \cup R_2$ is itself a bisimulation, since related states give rise to related states for both relations. (This holds for infinite unions as well. The empty relation is an unintersting bisimulation, as is the identity relation.) In particular, the union of all bisimulations is itself a bisimulation, called bisimilarity. Bisimilarity is the coarsest way to observe a system that does not distinguish between distinct states.

Bisimilarity is a coinductive property. It can be defined as the largest fixpoint of an operator: it is the largest relation which, when extended to identify equivalent states, remains the same.

References

  • Coq and the calculus of inductive constructions

    • Yves Bertot and Pierre Castéran. Interactive Theorem Proving and Program Development — Coq'Art: The Calculus of Inductive Constructions. Springer, 2004. Ch. 13. [website] [Amazon]
    • Eduardo Giménez. An application of co-inductive types in coq: verification of the alternating bit protocol. In Workshop on Types for Proofs and Programs, number 1158 in Lecture Notes in Computer Science, pages 135–152. Sprin­ger-Verlag, 1995. [Google Books]
    • Eduardo Giménez and Pierre Castéran. A Tutorial on [Co-]Inductive Types in Coq. 2007. [PDF]
  • Labeled transition systems and bisimulations

    • Robin Milner. Communication and Concurrency. Prentice Hall, 1989.
    • Davide Sangiorgi. On the origins of bisimulation and coinduction. ACM Transactions on Programming Languages and Systems (TOPLAS), volume 31 issue 4, May 2009. [PDF] [ACM] Associated course slides: [PDF] [CiteSeer]
    • Davide Sangiorgi. The Pi-Calculus: A Theory of Mobile Processes. Cambridge University Press, 2003. [Amazon]

    • A chapter in Certified Programming with Dependent Types by A. Chlipala

    • D. Sangiorgi. "Introduction to Bisimulation and Coinduction". 2011. [PDF]
    • D. Sangiorgi and J. Rutten. Advanced Topics in Bisimulation and Coinduction. Cambridge University Press, 2012. [CUP]

OTHER TIPS

Let us consider the following inductive definition:

$\qquad \displaystyle \begin{align*} &\phantom{\Rightarrow} \quad \varepsilon \in \mathcal{T} \\ w \in \mathcal{T} \quad &\Rightarrow \quad aw \in \mathcal{T}\\ aw \in \mathcal{T} \quad &\Rightarrow \quad baw \in \mathcal{T} \end{align*}$

What is $\mathcal{T}$? Clearly, the set of strings with no two subsequent $b$, i.e.

$\qquad \displaystyle \mathcal{T} =\{\varepsilon, a, aa, ba, aaa, aba, \dots\} = \mathcal{L}\left((ba\mid a)^*\right) \subseteq \Sigma^*.$

Right? Well, what we need for that is the innocuous sentence "and $\mathcal{T}$ is the smallest set that fulfills these conditions". True enough, as otherwise $\mathcal{T}=\{a,b\}^*$ would work, too.

But there is more to it than that. Write above definition as (monotone) function $f : 2^{\Sigma^\infty} \to 2^{\Sigma^\infty}$:

$\qquad f(T) = T \cup \{\varepsilon\} \cup \{aw \mid w\in T\} \cup \{baw \mid aw \in T\}$

Now $\mathcal{T}$ is the smallest fixpoint of $f$. In fact, because $f$ is monotone and $\left(2^{\Sigma^\infty},\subseteq\right)$ is a complete lattice, Knaster-Tarski theorem tells us that such a smallest fixpoint exists and is a proper language. Because this works with any reasonable¹ inductive definition, we normally don't talk about this. It just fits our intuition: we start with $\{\varepsilon\}$ and apply the rules step by step; in the limit, we get $\mathcal{T}$.

Now we turn things around. Instead of saying "if $w$ is included, so is $aw$" we say "if $aw$ is included, so must have been $w$". We can not turn the anchor around, so it goes away. That leaves us with a problem: we have to be able to take arbitrarily long prefixes away from any word in $\mathcal{T}'$ and remain in $\mathcal{T}'$! This is not possible with finite words; good thing I sneaked in $\Sigma^\infty$ above! We end up with the set of infinite words without a factor (substring) $bb$, i.e. $\mathcal{T}'=\mathcal{L}\left((ba\mid a)^\omega\right)$.

In terms of $f$, $\mathcal{T}'$ is its largest fixpoint². This is actually quite intuitive: we can not hope to hit $\mathcal{T}'$ from below, i.e. inductively by starting from $\{\varepsilon\}$ and adding stuff that fulfills the rules, so we go from above, i.e. coinductively by starting from $\Sigma^\infty$ and removing stuff that does not conform to the rules.


Notation:

  • $\Sigma^\infty = \Sigma^* \cup \Sigma^\omega$
  • $\Sigma^\omega$ is the set of all infinite sequences over $\Sigma$.

¹ You are not allowed to do stuff like $w \in \mathcal{T} \Rightarrow aw \notin \mathcal{T}$; the corresponding function would not be monotone.
² We have to sweep $\{\varepsilon\}$ under the rug somehow.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top