How to design objects for performance

Question 1

It is the compiler's responsibility to decide what reasonable padding (assuming typical access patterns) is. The compiler does know a whole lot more about your machine than you'll ever will. Besides, your machine will be with you a couple of years; the program will be around for decades, running on a wide range of platforms, subject to a mind boggling variety of usage patterns. What is the best for today's i7 could very well be the worst for tomorrow's i8 or ARMv11.

Obfuscating code in pursuit of elusive "performance" falls squarely into premature optimization. Always remember that your time (writing, testing, debugging, understanding again after a week's time, on tweaked code) is much, much more expensive than the possibly wasted computer time (unless said code is run thousands of times a day on millions of machines, that is). Code tweaking makes no sense at all until you have hard facts showing that the performance isn't enough, and measurements telling you that shuffling that structure around is a bottleneck worth worrying about.

Question 2

Processors doesnt "read" the memory byte by byte as humans, they process it chunk by chunk, of variable sizes depending of the processor. It's called memory access granularity;

By "memory aligning" your object, the acess time may be faster and you can also avoid data fragmentation.

You can read more about data alignement here

Edit: I'm not saying that it's a good or bad practice, just sharing what I know about it.

Question 3

There are two really important things to say in answer to this question.

First, if you're going to tweak code for performance benefits, and if you've decided it's worthwhile (for whatever reason), you must first write a benchmark. You must be able to try both and measure the difference.

Second, tweaks of this kind will depend on how the assembly language interacts with the hardware. You must be able to read assembly language code and understand the different instructions sets and hardware accessing modes in order to understand why these tweaks might work.

Finally, your question has no answer in isolation. It depends on whether those objects are allocated individually or are in collections; whether there are other objects next to them; and how the compiler generates code for each case. In all likelihood alignment on a power-of-two boundary will be faster than misalignment, but a collection that fits in a cache is faster than one that doesn't. I wouldn't expect padding 8 or 4 bytes to improve performance, but if it was important, I would try it and test the result.