Parsing challenge: Old logician's dot notation

Question 1

Dot notation was most famously used by Russell and Whitehead in Principia Mathematica (1910-1913) but the notation was borrowed from Guiseppe Peano. It was also used by Alonzo Church, Willard Van Orman Quine, and other influential logicians (see this entry in the Stanford Encylopedia of Philosophy).

With a bit of practice, it's not difficult to read formulae in dot-notation, but it is not quite as elegant as it first appears. To start with, Russell and Whitehead still found it useful to use parentheses with the negation operator ~:

*3·01. p.q .=. ~(~p v ~q)

As the above example show, the dot is used both as the conjunction operator and to express precedence. Consequently, a stronger conjunction might be written as : or even :..

Finally, in order to reduce visual clutter (I suppose), Russell and Whitehead also use precedence relations, in which the set of operators is divided into three precedence groups, such that the dots for an operator of higher precedence dominate an equal number of dots for an operator of lower precedence. Between operators of equal precedence, it's not legal for the number of dots to be equal, but Russell and Whitehead also defined some ternary operators such as p v q v r in order to be able to avoid having to specify unimportant precedence. (The precise parsing rules for such expressions were never spelled out formally, as far as I know, but the definitions appear in PM.)

Having said all that, it is not terribly difficult to parse dot-notation using a variant of the shunting-yard algorithm. Unfortunately, it cannot be parsed with a context free grammar, which makes it less useful for grammars generated with automated tools. Even GLR parsers are restricted to CFGs. (The fact that dot-notation is not context-free can be proven with the pumping lemma; it makes a rather more interesting exercise than the usual pumping lemma applications, at least imho.)

If you allow the CFG to have an infinite number of (dot) symbols and a corresponding infinite number of rules, then it is quite straightforward to write a grammar (or, more precisely, a grammar template since most of the rules are parameterized by dot-count). Thus, in theory you could parse a dot-expression by first scanning to find the maximum number of dots used, and then generating the corresponding finite CFG from the template. (The resulting is LALR(1) as long as you define a separate terminal for every sequence of dots.) In practice, this would be done by using predicates in the CFG, so you could create a parser with a parser-generator which allows predicates (ANTLR does, for example, but I'd personally use a bottom-up generator to avoid fiddling around with left-recursion elimination.)

It's important to note that dot-notation has its own variant of "redundant parentheses", since you could, at least in theory, use more dots than necessary. When I was playing around with the (theoretical but not really useful) algorithm above -- automatically generating CFGs -- I found it easier to require dot-minimality, because otherwise you end up creating a lot more unit rules. I wasn't able to find a machine-readable copy of PM to test with, but in all the searching I did, I never found an expression which was not dot-minimal. I don't know whether that was the result of compulsive tidiness or the idea that only dot-minimal expressions were legal.

Question 2

Here's an interesting application. Perl6 allows you to extend the language, adding new operators and defining their precedence relative to existing operators. The follow code sample defines the operators*. .* etc. The tests below pass.

use v6;
use Test;

sub infix:<*.>  ($a, $b) is looser(&infix:<+>)  { $a * $b }
sub infix:<.*>  ($a, $b) is looser(&infix:<+>)  { $a * $b }
sub infix:<*:>  ($a, $b) is looser(&infix:<*.>) { $a * $b }
sub infix:<:*>  ($a, $b) is looser(&infix:<.*>) { $a * $b }

sub infix:<+.>  ($a, $b) is looser(&infix:<*.>) { $a + $b }
sub infix:<.+>  ($a, $b) is looser(&infix:<.*>) { $a + $b }
sub infix:<+:>  ($a, $b) is looser(&infix:<*:>) { $a + $b }
sub infix:<:+>  ($a, $b) is looser(&infix:<:*>) { $a + $b }

# Tests

is 1 + 3 .* 2,
    (1 + 3) * 2;

is 1 *. 3 + 2,
    1 * (3 + 2);

is 1 *. 2 +. 2,
    (1 * 2) + 2;

is 2 *: 2 + 3 .* 4,
    2 * ((2 + 3) * 4);

Question 3

ISO core standard Prolog would not allow directly parsing old Logicians dot notation. Although it can form symbols including a dot, a prominent one being the univ operator

(=..)/2. Different symbols used as operator, lead to different functors in the parsed term. To mitigated the problem I have introduced the sys_alias/1 operator

property in my Prolog system. Here is an example definition:

:- op(750, xfx, .=.).
:- set_oper_property(infix(.=.), sys_alias(=)).

In the above we use second variant of (=)/2. The usual (=)/2 has operator level 700, and the second variant has a higher operator level 750, leading to the desired parenthesis.

And here is an example run:

Jekejeke Prolog 4, Runtime Library 1.4.5

?- write(p = q .=. q = p), nl.
(p = q) = (q = p)
Yes

Could be helpful to parse this beast:

On the single axioms of protothetic. II.
Sobociński, Bolesław - 1961
https://projecteuclid.org/euclid.ndjfl/1093956834