Finding lowest average grade difference

Question 1

OK, I found something based on some geometric observations.
Say you have only 4 numbers for now: a1 <= a2 and b1 <= b2 (0).

|a1-b1| + |a2-b2| <= |a1-b2| + |a2-b1| (1)

In this case, we want to check if (1) is true.

I rewrite (1) based on obvious equivalence relations:

( |a1-b1| + |a2-b2| ) ^ 2 <= ( |a1-b2| + |a2-b1| ) ^ 2 (2)

-2 * a1 * b1 -2 * a2 * b2 <= -2 * a1 * b2 -2 * a2 * b1 (3)

Should I explain how I got from (2) to (3)? I guess not.

Then we get:

a1*b1 + a2*b2 >= a1*b2 + a2*b1 (4)

a1(b1-b2) >= a2*(b1-b2) (5)

a1(b1-b2) - a2*(b1-b2) >=0 (6)

a1(b1-b2) + a2*(b2-b1) >=0 (7)

But (5) is obviously true because b1-b2 <= 0 and a1 <= a2 (see (0)).

This is a rigorous proof for N=2.

My gut tells me this should be generalized somehow
pretty easily for the case of N. Maybe we can try an
induction from here (having seen these (1),(2),(3), etc.).

Geometrically, you can imagine that Ai and Bj as points
on two parallel numeric linex (axis A, axis B). One
pairing configuration is defined by connecting the paired
points from A and B with segments. Your statement basically
says that the optimal pairing is the one in which no two
segments (Ai,Bj) cross each other (they may overlap with
each other /in the optimal solution/ but may not cross each other).
Right?

Now, if we do the same thing (which I did for N=2),
for any N, you will get this question: "Is
a1(b1-bi1) + a2(b2-bi2) + a3(b3-bi3) + ... + aN*(bN-biN) >= 0 (4')
true", where i1,i2,...,iN is any permutation of (1,2,...,N),
and given that a[i] <= a[i+1] and b[i] <= b[i+1] for each i.

Now, here we do induction on N to prove (4').
Suppose (4') is true for N and for all K such that K<N.
Add two more numbers to A and B. Say aN+1 and bN+1.
Let's say they get inserted at positions s1 (in A) and s2 (in B)
in their respective SORTED sequences (A, B). Let's say s1 <= s2
(the reverse case is analogical). So now as1 = aN+1 and bs2 = bN+1,
but s1 and s2 are their real indices in the NEW sorted sequences.

But now proving (4') for N+1 turns into a matter of
proving it for N=2 because only these terms (from (4'))
matter when we do the step from N to N+1.

as1 * (bs1 - bs2) + as2 * (bs2 - bs1) >= 0 (7')
and as we saw this is true for N=2 (see (7) above).

For the other N-1 terms (which remain from (4')), we get that the
inequality (4') is true due to the assumption from the induction (that it is
true for N-1). Thus from the truth for 2 and N-1 we got the truth for N+1.
Hope you understand how I did it. On paper it is easier
to write it, hard to write it here.

So this should be your rigorous proof for the N case.

Question 2

OK. Say we have:

P1<=P2<=...<=Pn, and S1<=S2<=...<=Sn.

Let's assume that this statement is not true, and there's another optimal solution. This means we assume P1 is paired with Sk, where k>1 in that solution.

Now let's take a look at that Pj which is paired with S1 (if you draw
it as a graph, these two lines P1-Sk and Pj-S1 would cross each other).

So I am interested in the first two lines which cross each other.

Now think what happens if we take that same configuration but we change it by pairing P1 with S1 and Pj with Sk. The sum will decrease (or it could stay the same if P1 = Pj or S1 = Sk).

Example:
1 3 5 7 100 150
2 4 6 8 120 120

But if it stays the same we can move on ahead in the two sequences.

So it turned out that our optimal solution is not optimal. This is contradiction so our assumption is not correct.

Note 1: Actually in my proof I should be saying not P1, but the Pm with smallest m, such that Pm is not paired with Sm but with some Sk where k>m. I've said P1 here just for simplicity.

Note 2: To prove this part:
"Now think what happens if take that same configuration but instead we pair P1 with S1 and Pj with Sk. The sum will decrease." just think what happens if you have 4 numbers only
P1 <= P2
S1 <= S2

Then should you pair them: P1-S1 and P2-S2 or crossed-over like
P1-S2 and S1-P2. Here a few different cases need to be worked out.

1 1 (couple 1 - equal numbers)
2 2 (couple 2 - also equal numbers)

1 3 (different numbers)
2 2 (equal numbers)

1 5 (different numbers)
2 10 (different numbers)

Note 3: OK, I think some boundary cases and details should
be worked out :) but basically this idea should work.

Question 3

Observe that: 
- no matter how you arrange the pairs, the sums of each set will be constant, and so will be the difference between the two sums. 
- the 'sum of differences' can only be equal or greater than the 'difference of sums'

So what is that makes the 'sum of differences' greater than the 'difference of sums'?

This is how I would structure the proof:
1) what you want to prove is that sorting minimizes the cases that make the 'sum of differences' greater.
2) what can make the 'sum of differences' greater than the 'difference of sums' are the pairs where the size relation is opposite to the size relation of the sums. For example, if in your sample the sum of all the P grades is greater than the sum of all the S grades (sumP > sumS), any pair where p < s.
3) Now, all you have to prove is that any non-sorted arrangement can only make things worse.

(Forgive the absence of proper math language and depth, I have an excuse for the first though, I'm new to SO and still trying to master the editor)

Question 4

You are trying find minimal average difference. So what is average difference: 1/n * ((p1-s1)+(p2-s2)+...+(pn-sn)) it is 1/n * ((p1+p2+..+pn) - (s1+s2+..+sn)) which doesn't seem depend on ordering. Well you can sort it and pair and it will have minimal average distance as well as any other ordering.