No, you don't need the outputlayer size to match the number of classes.
I think what you need to understand yet is that the output layer's output is simply a representation of the network's input. That being said, you can have any output layer you want. If you want an encoding that mirrors your classes, the easiest way to reduce the number of nodes in the layer would be to use binary encoding.
Example: Instead of using 8 Nodes for 8 Classes (1 node per class), you can use 3 neurons:
Class 0 is the output 0-0-0
Class 1 is the output 0-0-1
...
Class 7 is the output 1-1-1
I think you get the idea. Of course you can use not only binary but literally any encoding method you can think of (or google).