Don't look for a mathematically rigorous description of these terms; they are a lot vaguer than that, loose classifications that can overlap.
"Dataflow" I think is fairly clear here; it DOES describe the flow of data, and it describes it in terms of concurrent statements. But I would add that each concurrent statement is woken by changes on its inputs and delivers its outputs; therefore (the important bit:) there is no correspondence between the order of things happening and the order of elements in the source code. In that respect it has a lot in common with functional programming. And both the first two models are dataflow; in (I) the elements are in logical order while (II) is not.
"Behavioural" SHOULD be fairly clear too - it simply describes a circuit in terms of its behaviour.
But it is not in general opposed to dataflow - though your San Jose quote is somewhat correct - behavioural descriptions are commonly sequential simply because the sequential paradigm (inside a VHDL process) is common and familiar to programmers. Even so, the behaviour of several such processes interacting with each other is ... dataflow.
Behavioral then is NOT correctly opposed to dataflow. It is more correctly opposed to RTL (Register Transfer Level) and structural which have fairly clear meanings.
A structural description consists of a number of building block (gates, multiplexers, entire CPUs) and the signals interconnecting them : a textual block diagram (perhaps auto-generated from a graphical one). As such it can be either the lowest level (see frequent questions here about making an adder out of gates!) or the highest level (connecting CPU to memory, peripherals, etc).
An RTL description is fairly low level; it describes the transfer and operations on data between storage elements (registers) and is common inside a process; it is rather like an assembly language listing from a (behavioural) C program.
Lastly - too many descriptions and too many extraneous details get in the way of doing a proper design job. Look at the task in hand, extract its essence, and implement that.
A multiplexer selects one of a collection of input elements according to the index of the element you want. The most natural form of index is usually an integer type, rarely including negative indices, and the most natural form of collection in VHDL is ... an array.
So why not write
ENTITY mux IS
PORT ( a, b, c, d : in BIT;
sel : in natural range 0 to 3;
x : out BIT);
END mux;
ARCHITECTURE simple OF mux IS
SIGNAL values : array (0 to 3) of BIT;
BEGIN
values <= a & b & c & d;
x <= values(sel); -- after 0.5 ns; if you need to model timing!
END simple;
or better, make "values" an input port...