Membership proofs for AVL trees

Question

I think your best bet is to define _∈_ as an inductive relation. Yes, this requires you to know the internals of Data.AVL, but I'm fairly sure this will be the most pleasant representation to work with.

The internal structure of Data.AVL is actually quite simple. We have a type Indexed.Tree, which is indexed by three values: the lower bound, the upper bound and the height. Given a tree t : Indexed.Tree lb ub h, all values inside t are within the range (lb, ub).

There's a slight twist to it, though: since we need to have a tree that can contain arbitrary values, we need to artifically extend the _<_ relation given by isStrictTotalOrder with two new values - you can think of those as a negative and a positive infinity. In the Data.AVL module, these are called ⊥⁺ and ⊤⁺. Trees that can contain arbitrary values are then of type Tree ⊥⁺ ⊤⁺ h.

The last piece is the balancing: each node requires heights of its subtrees to be at most one level apart. We don't actually need to touch balancing, but the function signatures might mention it.

Anyways, I'm working directly with this raw (indexed) variant. The opaque, unindexed version is just something like:

data Tree : Set ? where
  tree : ∀ {h} → Indexed.Tree ⊥⁺ ⊤⁺ h

Some boilerplate first:

open import Data.Empty
open import Data.Product
open import Level
open import Relation.Binary
open import Relation.Binary.PropositionalEquality as P using (_≡_)
open import Relation.Nullary

import Data.AVL

module Membership
  {k v ℓ}
  {Key : Set k} (Value : Key → Set v)
  {_<_ : Rel Key ℓ}
  (isStrictTotalOrder : IsStrictTotalOrder _≡_ _<_) where

open Data.AVL Value isStrictTotalOrder public
open Extended-key                      public
open Height-invariants                 public

open IsStrictTotalOrder isStrictTotalOrder

Here's the _∈_ as an inductive relation:

data _∈_ {lb ub} (K : Key) :
    ∀ {n} → Indexed.Tree lb ub n → Set (k ⊔ v ⊔ ℓ) where
  here : ∀ {hˡ hʳ} V
    {l : Indexed.Tree lb [ K ] hˡ}
    {r : Indexed.Tree [ K ] ub hʳ}
    {b : hˡ ∼ hʳ} →
    K ∈ Indexed.node (K , V) l r b
  left : ∀ {hˡ hʳ K′} {V : Value K′}
    {l : Indexed.Tree lb [ K′ ] hˡ}
    {r : Indexed.Tree [ K′ ] ub hʳ}
    {b : hˡ ∼ hʳ} →
    K < K′ →
    K ∈ l →
    K ∈ Indexed.node (K′ , V) l r b
  right : ∀ {hˡ hʳ K′} {V : Value K′}
    {l : Indexed.Tree lb [ K′ ] hˡ}
    {r : Indexed.Tree [ K′ ] ub hʳ}
    {b : hˡ ∼ hʳ} →
    K′ < K →
    K ∈ r →
    K ∈ Indexed.node (K′ , V) l r b

This is the sort of inductive definition you'd expect: either the key is in this inner node or it's down in one of the subtrees. We could also do without the less-than, greater-than proofs, but this is more convenient - when you want to show that a tree does not contain a particular element, you only have to track the path lookup would take, instead of searching the whole tree.

How to interpret those l and r implicit arguments? Notice that is makes perfect sense: we have a key K and naturally we require that the values contained in l fall between lb and K (it's actually [ K ], since we are using the extended _<_) and the values in r fall between K and ub. The balancing (b : hˡ ∼ hʳ) is there just so we can construct an actual tree node.

Your get function is then very simple:

get : ∀ {h lb ub n} {m : Indexed.Tree lb ub h} → n ∈ m → Value n
get (here    V) = V
get (left  _ p) = get p
get (right _ p) = get p

Well, I told you that this representation is fairly convenient to work it and I'm going to prove it. One of the properties we'd like _∈_ to have is decidability: that is, we can construct a program that tells us whether an element is in a tree or not:

find : ∀ {h lb ub} n (m : Indexed.Tree lb ub h) → Dec (n ∈ m)

find will return either yes p, where p is a proof that n is inside m (n ∈ m), or no ¬p, where ¬p is refutation of n ∈ m, n ∈ m → ⊥. We'll need one lemma:

lem : ∀ {lb ub hˡ hʳ K′ n} {V : Value K′}
  {l : Indexed.Tree lb [ K′ ] hˡ}
  {r : Indexed.Tree [ K′ ] ub hʳ}
  {b : hˡ ∼ hʳ} →
  n ∈ Indexed.node (K′ , V) l r b →
  (n ≯ K′ → n ≢ K′ → n ∈ l) × (n ≮ K′ → n ≢ K′ → n ∈ r)
lem (here    V) =
  (λ _ eq → ⊥-elim (eq P.refl)) , (λ _ eq → ⊥-elim (eq P.refl))
lem (left  x p) = (λ _ _ → p) , (λ ≮ _ → ⊥-elim (≮ x))
lem (right x p) = (λ ≯ _ → ⊥-elim (≯ x)) , (λ _ _ → p)

This tells us that if we know n is contained in t and we know n is less than the key of the root of t, then n must be in the left subtree (and similarly for right subtree).

Here's the implementation of find function:

find : ∀ {h lb ub} n (m : Indexed.Tree lb ub h) → Dec (n ∈ m)
find n (Indexed.leaf _) = no λ ()
find n (Indexed.node (k , v) l r _) with compare n k
find n (Indexed.node (k , v) l r _) | tri< a ¬b ¬c with find n l
... | yes p = yes (left a p)
... | no ¬p = no λ ¬∈l → ¬p (proj₁ (lem ¬∈l) ¬c ¬b)
find n (Indexed.node (k , v) l r _) | tri≈ ¬a b ¬c
  rewrite (P.sym b) = yes (here v)
find n (Indexed.node (k , v) l r _) | tri> ¬a ¬b c with find n r
... | yes p = yes (right c p)
... | no ¬p = no λ ¬∈r → ¬p (proj₂ (lem ¬∈r) ¬a ¬b)

The implementation is fairly straightforward, but I would suggest loading it up in Emacs, trying to replace some of the right-hand-sides with holes and see what the types are. And finally, here are some tests:

open import Data.Nat
open import Data.Nat.Properties

open Membership
  (λ _ → ℕ)
  (StrictTotalOrder.isStrictTotalOrder strictTotalOrder)

un-tree : Tree → ∃ λ h → Indexed.Tree ⊥⁺ ⊤⁺ h
un-tree (tree t) = , t

test : Indexed.Tree _ _ _
test = proj₂ (un-tree
  (insert 5 55 (insert 7 77 (insert 4 44 empty))))

Extract : ∀ {p} {P : Set p} → Dec P → Set _
Extract {P = P} (yes _) = P
Extract {P = P} (no  _) = ¬ P

extract : ∀ {p} {P : Set p} (d : Dec P) → Extract d
extract (yes p) = p
extract (no ¬p) = ¬p

∈-test₁ : ¬ (2 ∈ test)
∈-test₁ = extract (find 2 test)

∈-test₂ : 4 ∈ test
∈-test₂ = extract (find 4 test)