How to describe algorithms, prove and analyse them?

https://cs.stackexchange.com/questions/2374

16-10-2019
|

Pergunta

Before reading The Art of Computer Programming (TAOCP), I have not considered these questions deeply. I would use pseudo code to describe algorithms, understand them and estimate the running time only about orders of growth. The TAOCP thoroughly changes my mind.

TAOCP uses English mixed with steps and goto to describe the algorithm, and uses flow charts to picture the algorithm more readily. It seems low-level, but I find that there's some advantages, especially with flow chart, which I have ignored a lot. We can label each of the arrows with an assertion about the current state of affairs at the time the computation traverses that arrow, and make an inductive proof for the algorithm. The author says:

It is the contention of the author that we really understand why an algorithm is valid only when we reach the point that our minds have implicitly filled in all the assertions, as was done in Fig.4.

I have not experienced such stuff. Another advantage is that, we can count the number of times each step is executed. It's easy to check with Kirchhoff's first law. I have not analysed the running time exactly, so some $\pm1$ might have been omitted when I was estimating the running time.

Analysis of orders of growth is sometimes useless. For example, we cannot distinguish quicksort from heapsort because they are all $E(T(n))=\Theta(n\log n)$, where $EX$ is the expected number of random variable $X$, so we should analyse the constant, say, $E(T_1(n))=A_1n\lg n+B_1n+O(\log n)$ and $E(T_2(n))=A_2\lg n+B_2n+O(\log n)$, thus we can compare $T_1$ and $T_2$ better. And also, sometimes we should compare other quantities, such as variances. Only a rough analysis of orders of growth of running time is not enough. As TAOCP translates the algorithms into assembly language and calculate the running time, It's too hard for me, so I want to know some techniques to analyse the running time a bit more roughly, which is also useful, for higher-level languages such as C, C++ or pseudo codes.

And I want to know what style of description is mainly used in research works, and how to treat these problems.

Solução

There is a huge variety of feasible approaches. Which is best suited depends on

what you are trying to show,
how much detail you want or need.

If the algorithm is a widely known one which you use as a subroutine, you often remain at a higher level. If the algorithm is the main object under investigation, you probably want to be more detailed. The same can be said for analyses: if you need a rough upper runtime bound you proceed differently from when you want precise counts of statements.

I will give you three examples for the well-known algorithm Mergesort which hopefully illustrate this.

High Level

The algorithm Mergesort takes a list, splits it in two (about) equally long parts, recurses on those partial lists and merges the (sorted) results so that the end-result is sorted. On singleton or empty lists, it returns the input.

This algorithm is obviously a correct sorting algorithm. Splitting the list and merging it can each be implemented in time $\Theta(n)$, which gives us a recurrence for worst case runtime $T(n) = 2T\left(\frac{n}{2}\right) + \Theta(n)$. By the Master theorem, this evaluates to $T(n) \in \Theta(n\log n)$.

Medium Level

The algorithm Mergesort is given by the following pseudo-code:

procedure mergesort(l : List) {
  if ( l.length < 2 ) {
    return l
  }

  left  = mergesort(l.take(l.length / 2)
  right = mergesort(l.drop(l.length / 2)
  result = []

  while ( left.length > 0 || right.length > 0 ) {
    if ( right.length == 0 || (left.length > 0 && left.head <= right.head) ) {
      result = left.head :: result
      left = left.tail
    }
    else {
      result = right.head :: result
      right = right.tail
    }
  }

  return result.reverse
}

We prove correctness by induction. For lists of length zero or one, the algorithm is trivially correct. As induction hypothesis, assume mergesort performs correctly on lists of length at most $n$ for some arbitrary, but fixed natural $n>1$. Now let $L$ be a list of length $n+1$. By induction hypothesis, left and right hold (non-decreasingly) sorted versions of the first resp. second half of $L$ after the recursive calls. Therefore, the while loop selects in every iteration the smallest element not yet investigated and appends it to result; thus result is a non-increasingly sorted list containing all elements from left and right. The reverse is a non-decreasingly sorted version of $L$, which is the returned -- and desired -- result.

As for runtime, let us count element comparisons and list operations (which dominate the runtime asymptotically). Lists of length less than two cause neither. For lists of length $n>1$, we have those operations caused by preparing the inputs for the recursive calls, those from the recursive calls themselves plus the while loop and one reverse. Both recursive parameters can be computed with at most $n$ list operations each. The while loop is executed exactly $n$ times and every iteration causes at most one element comparison and exactly two list operations. The final reverse can be implemented to use $2n$ list operations -- every element is removed from the input and put into the output list. Therefore, the operation count fulfills the following recurrence:

$\qquad \begin{align}T(0) = T(1) &= 0 \\ T(n) &\leq T\left(\left\lceil\frac{n}{2}\right\rceil\right) + T\left(\left\lfloor\frac{n}{2}\right\rfloor\right) + 7n\end{align}$

As $T$ is clearly non-decreasing, it is sufficient to consider $n=2^k$ for asymptotic growth. In this case, the recurrence simplifies to

$\qquad \begin{align}T(0) = T(1) &= 0 \\ T(n) &\leq 2T\left(\frac{n}{2}\right) + 7n\end{align}$

By the Master theorem, we get $T \in \Theta(n \log n)$ which extends to the runtime of mergesort.

Ultra-low level

Consider this (generalised) implementation of Mergesort in Isabelle/HOL:

types dataset  =  "nat * string"

fun leq :: "dataset \<Rightarrow> dataset \<Rightarrow> bool" where
   "leq (kx::nat, dx) (ky, dy) = (kx \<le> ky)"

fun merge :: "dataset list \<Rightarrow> dataset list \<Rightarrow> dataset list" where
"merge [] b = b" |
"merge a [] = a" |
"merge (a # as) (b # bs) = (if leq a b then a # merge as (b # bs) else b # merge (a # as) bs)"

function (sequential) msort :: "dataset list \<Rightarrow> dataset list" where
  "msort []  = []" |
  "msort [x] = [x]" |
  "msort l   = (let mid = length l div 2 in merge (msort (take mid l)) (msort (drop mid l)))"
by pat_completeness auto
  termination
  apply (relation "measure length")
by simp+

This already includes proofs of well-definedness and termination. Find an (almost) complete correctness proof here.

For the "runtime", that is number of comparisons, a recurrence similar to the one in the prior section can be set up. Instead of using the Master theorem and forgetting the constants, you can also analyse it to get an approximation that is asymptotically equal the true quantity. You can find the full analysis in [1]; here is a rough outline (it does not necessarily fit the Isabelle/HOL code):

As above, the recurrence for the number of comparisons is

$\qquad \begin{align}f_0 = f_1 &= 0 \\ f_n &= f_{\left\lceil\frac{n}{2}\right\rceil} + f_{\left\lfloor\frac{n}{2}\right\rfloor} + e_n\end{align}$

where $e_n$ is the number of comparisons needed for merging the partial results². In order to get rid of the floors and ceils, we perform a case distinction over whether $n$ is even:

$\qquad \displaystyle \begin{cases} f_{2m} &= 2f_m + e_{2m} \\ f_{2m+1} &= f_m + f_{m+1} + e_{2m+1} \end{cases}$

Using nested forward/backward differences of $f_n$ and $e_n$ we get that

$\qquad \displaystyle \sum\limits_{k=1}^{n-1} (n-k) \cdot \Delta\kern-.2em\nabla f_k = f_n - nf_1$.

The sum matches the right-hand side of Perron's formula. We define the Dirichlet generating series of $\Delta\kern-.2em\nabla f_k$ as

$\qquad \displaystyle W(s) = \sum\limits_{k\geq 1} \Delta\kern-.2em\nabla f_k k^{-s} = \frac{1}{1-2^{-s}} \cdot \underbrace{\sum\limits_{k \geq 1} \frac{\Delta\kern-.2em\nabla e_k}{k^s}}_{=:\ \boxminus(s)}$

which together with Perron's formula leads us to

$\qquad \displaystyle f_n = nf_1 + \frac{n}{2\pi i} \int\limits_{3-i\infty}^{3+i\infty} \frac{\boxminus(s)n^s}{(1-2^{-s})s(s+1)}ds$.

Evaluation of $\boxminus(s)$ depends on which case is analysed. Other than that, we can -- after some trickery -- apply the residue theorem to get

$\qquad \displaystyle f_n \sim n \cdot \log_2(n) + n \cdot A(\log_2(n)) + 1$

where $A$ is a periodic function with values in $[-1,-0.9]$.

Mellin transforms and asymptotics: the mergesort recurrence by Flajolet and Golin (1992)
Best case: $e_n = \left\lfloor\frac{n}{2}\right\rfloor$
Worst case: $e_n = n-1$
Average case: $e_n = n - \frac{\left\lfloor\frac{n}{2}\right\rfloor}{\left\lceil\frac{n}{2}\right\rceil + 1} - \frac{\left\lceil\frac{n}{2}\right\rceil}{\left\lfloor\frac{n}{2}\right\rfloor + 1}$

Outras dicas

Dijkstra's "A discipline of Programming" is all about analyzing and proving algorithms and designing for provability. In the preface to that book, Dijkstra explains how a very simple constructed mini-language that is properly designed to be analyzed suffices to explain many algorithms formally:

When starting on a book like this, one is immediately faced with the question: "Which programming language am I going to use?", and this is not a mere question of presentation! A most important, but also a most elusive, aspect of any tool is its influence on the habits of those who train themselves in its use. If the tool is a programming language, this influence is -- whether we like it or not -- an influence on our thinking habits. Having analyzed that influence to the best of my knowledge, I had come to the conclusion that none of the existing programming languages, nor a subset of them, would suit my purpose; on the other hand I knew myself so unready for the design of a new programming language that I had taken a vow not to do so for the next five years, and I had a most distinct feeling that that period had not yet elapsed! (Prior to that, among many other things, this monograph had to be written.) I have tried to resolve this conflict by only designing a mini-language suitable for my purposes, by making only those commitments that seemed unavoidable and sufficiently justified.

Later he explains just how small he managed to get his mini-language.

I owe the reader an explanation why I have kept the mini-language so small that it does not even contain procedures and recursion. ... The point is that I felt no need of them in order to get my message across, viz. how a carefully chose separation of concerns is essential for the design of in all respect, high-quality programs; the modest tools of the mini-language gave us already more than enough latitude for nontrivial, yet very satisfactory designs.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange