Why do textbooks use pseudocode rather than real languages?

https://softwareengineering.stackexchange.com/questions/310695

12-12-2020
|

Question

In colleges and in algorithm textbooks, it is quite common for the teacher and author to explain control flow in pseudo-code. With the advent of more expressive languages like Python and Haskell among others, is it reasonable that colleges switch to explain algorithms via one of these languages?

The only advantage of pseudo-code I can think of is that it's purportedly language-agnostic. But it is not. Some pseudo-code uses an imperative approach, other pseudo-code looks functional. Authors just borrow semantics from whatever programming language they are comfortable using, or worse just describe the semantics in natural language. So if pseudo-code isn't actually language-agnostic, what is the advantage of using it then? Wouldn't it be better to just use an existing language?

Solution

No. The point of pseudo-code is that it doesn't have to compile. I can quickly gloss over irrelevant details. In contrast, even languages that look like pseudocode at the first glance can have very non-intuitive details that would just detract from the algorithm. Let's take for example Quicksort in Haskell:

qs :: Ord a => [a] -> [a]
qs [] = []
qs (pivot:xs) = (qs smaller) ++ pivot:(qs larger)
  where smaller = [x | x <- xs, x <= pivot]
        larger  = [x | x <- xs, x > pivot]

or the same in Python:

def qs(array):
  if not array:
    return []
  pivot = array[0]
  xs = array[1:]
  smaller = [x for x in xs if x <= pivot]
  larger  = [x for x in xs if x > pivot]
  return qs(smaller) + [pivot] + qs(larger)

The advantage in both cases is that this is executable code, and as such can be tested, typechecked, and toyed with by students. However, they both include syntactic details that are distracting. Students would usually be better served by pseudocode that illustrates the intention of the algorithm, not implementation details:

algorithm QUICKSORT(array)
  return [] if array is empty
  pivot ← array[0]
  xs ← array[1, ...] -- the rest of the array without the pivot
  smaller ← [x | x ∈ xs, x <= pivot] -- all smaller or equal elements
  larger ← [x | x ∈ xs, x  > pivot] -- all larger elements
  return [QUICKSORT(smaller)..., pivot, QUICKSORT(larger)...]

Notable differences:

I can just make up a list comprehension syntax that looks like maths rather than having to explain why Python has a for and if here.
I don't have to explain that language's syntax for list concatenation. Why does Python use + addition? What is : in Haskell? I can just choose a syntax that gets the point across more clearly.
the type signature Ord a => [a] -> [a] is just an implementation detail. While possibly helpful in this case, the type signatures sometimes required by Haskell can get absurd.
I don't have to explain why Python considers empty collections to be false, and what array[1:] is supposed to mean.
I avoid clever students pointing out that I should really use yield in the Python example.
Haskell sucks for explaining mutable data structures like Hash Tables, RB trees, ….
Things start getting very language-specific once we need complex records to express our algorithms. E.g. Python's object system has a few surprises that are just distracting.

That said, it can be very valuable to use one of these languages in addition to pseudocode, just carefully label what is what.

OTHER TIPS

No.

The entire purpose of pseudo-code is to abstract away the details and complexities of individual languages so that you focus on what the program's supposed to do, rather than how it does it. With pseudo-code you can make up arbitrary rules that do not need to conform to actual implementation requirements the way that real-world languages do, but only to the requirements of the actual topic at hand.

Furthermore, if the logic is presented in a way that you (as the student) can't just copy/paste into a file, compile it and be done, then you are forced to implement the solution yourself even when the solution itself is described for you. This encourages individual thought over copy/paste cheating.

What is Real?

Because Real is only in definition to an interpreter.

Is Mandarin any more or less real than English?

Certainly Mandarin is not particularly useful to an English speaker
similarly English is so much nonsense to a Mandarin speaker
unless they speak both.

So Real isn't even the question. Lets rephrase it:

Why is Pseudo Code used instead of a Formal Language?

A simple VENN Diagram can highlight the problem easily. The set of all humans that are English And Mandarin Speakers is the subset of English Or Mandarin Speakers. Because it takes effort to obtain proficiency in any language the intersection is generally much smaller than the union.

The textbook on programming can presume that you understand at least one natural language, the language the textbook is written in. Its generally safe to presume this, as otherwise a different more legible textbook would have been selected. After all learning one language is difficult enough - two is harder.

This gives the first reason for using a pseudo code. It maximises the audience that could easily read the book. This is done by following established language conventions already found in the natural language. Say recipes from cooking, mathematical formulas, etc... Any gap can be bridged by a quick natural language explanation, or failing that a final recourse to our visual system with pictures.

As for why the common language could not be the programming language. I leave to you to consider how much Mandarin (or any language you do not already speak) you have learned by reading a book about programming written with examples given in a familiar programming language.

What A textbook Achieves

As for the second reason, consider what a textbook must achieve:

explain why they would bother to learn an alien language instead of just using their natural language.
explain an alien language to the reader such that they might be able to speak it themselves.

Why Program

Most of the book has to convince you as to why you would want to learn and use this alien language, or any similar language. This means discussing the essence of programming itself.

How do you identify a problem
How do you break a problem apart
How do you architect the data
How do you architect the processes
How do you manage the dependencies
How do you identify faults
and more

Most of this has nothing to do with the machines themselves, its mostly a discussion on how meat-ware should operate in order to bring about a program. That's pretty complex because it has to show why we would link our human space goals, to program space problems and strive to solve them.

Describing a Program

The second textbook achievement is describing a Language. Now most programming languages can be described with a Grammar and a few semantic rules. On the shallow end are languages like JSON which can be defined fairly completely within three or so pages. More complex languages need a larger specification, but for the most part do not need a total understanding in order to be useful. What these descriptions are however are Pseudo Code. They specify the formal language in terms of a natural language. The difference is that these pseudo codes are specified in advance.

Now given that even Formal Languages are themselves (Executable) Pseudo Codes, the question is what is most important when describing an algorithm? The next-larger context.

The algorithm has a Goal that is reasonable in that context,
that context has some constraints,
and the algorithm is a description of how those constraints can be handled while achieving the goal.

At no point is the language that the algorithm is written in important. If anything only a few key operations are critical for the algorithms success. So the question then becomes:

is it better to describe a meat-ware program capable of interpreting the full specification of a Formal Language like C++/C#/Python/etc... in order to understand the algorithm
or just Define the 4 or so primitives required to understand the algorithm.

Given that learning a language is Hard, and the reader must have learned/learn a language in order to understand the algorithm, as the writer of a textbook what should you ask of the reader?

I'd go against the grain here and argue that yes, books and tutorials should use pseudocode that's based on real high level languages instead of inventing a wholly new pseudocode syntax.

However, I'd also warn any authors and readers that when used in this context, this should be pseudo-version of the real language, and authors should be mindful not to be too strict about following the syntax and limitations of the language. Instead, authors should think carefully about what is it they're trying to communicate with the snippet, and hide irrelevant details behind function calls and/or comments rather than striving for a full, actually compilable implementation or code that's idiomatic for the chosen base language.

Authors should take some creative liberty on the language to get their point across so that even someone who knows just the basics of the language can still understand the gist of the snippet, while those seeking more exact semantic can refer to the language's documentation or fallback to their prior knowledge about the language to fill in the details.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange