What is the fastest way to calculate e to 2 trillion digits?

Question 1

Since I'm the author of the y-cruncher program that you mention, I'll add my 2 cents.

For such a large task, the two biggest barriers that must be tackled are as follows:

Memory
Run-time Complexity

Memory

2 trillion digits is extreme - to say the least. That's double the current record set by Shigeru Kondo and myself back in 2010. (It took us more than 9 days to compute 1 trillion digits using y-cruncher.)

In plain text, that's about 1.8 TiB in decimal. In packed binary representation, that's 773 GiB.

If you're going to be doing arithmetic on numbers of this size, you're gonna need 773 GiB for each operand not counting scratch memory.

Feasibly speaking, y-cruncher actually needs 8.76 TiB of memory to do this computation all in ram. So you can expect other implementations to need the same give or take a factor of 2 at most.

That said, I doubt you're gonna have enough ram. And even if you did, it'd be heavily NUMA. So the alternative is to use disk. But this is not trivial, as to be efficient, you need to treat memory as a cache and micromanage all data that is transferred between memory and disk.

Run-time Complexity

Here we have the other problem. For 2 trillion digits, you're gonna need a very fast algorithm. Not just any fast algorithm, but a quasi-linear run-time algorithm.

Your current attempt runs in about O(N^2). So even if you had enough memory, it won't finish in your lifetime.

The standard approach to computing e to high precision runs in O(N log(N)^2) and combines the following algorithms:

Binary Splitting on the Taylor series expansion of e.
FFT-based large multiplication

Fortunately, GMP already uses FFT-based large multiplication. But it lacks two crucial features:

Out-of-core (swap) computation to use disk when there isn't enough memory.
It isn't parallelized.

The second point isn't as important since you can just wait longer. But for all practical purposes, you're probably gonna need to roll out your own. And that's what I did when I wrote y-cruncher.

That said, there are many other loose-ends that also need to be taken care of:

The final division will require a fast algorithm like Newton's Method.
If you're gonna compute in binary, you're gonna need to do a radix conversion.
If the computation is gonna take a lot of time and a lot of resources, you may need to implement fault-tolerance to handle hardware failures.

Question 2

Since you have a goal how many digits you want (2 trillion) you can estimate how many terms you'll need to calculate e to that number of digits. From this, you can estimate how many extra digits of precision you'll need to keep track of to avoid rounding errors at the 2 trillionth place.

If my calculation from Stirling's approximation is correct, the reciprocal of 10 to the 2 trillion is about the reciprocal of 100 billion factorial. So that's about how many terms you'll need (100 billion). The story's a little better than that, though, because you'll start being able to throw away a lot of the numbers in the calculation of the terms well before that.

Since e is calculated as a sum of inverse factorials, all of your terms are rational, and hence they are expressible as repeating decimals. So the decimal expansion of your terms will be (a) an exponent, (b) a non-repeating part, and (c) a repeating part. There may be some efficiencies you can take advantage of if you look at the terms in this way.

Anyway, good luck!