As your picture shows, the data portion is 32b (as you said, each set contains only 1 word).
The tag is a required portion of each set that allows us to know if the requesting address is located in the cache (a "hit"). Your picture says the tag is 27 bits in size.
"59" bits (actually 60 bits) is simply tracking how much actual SRAM is required to build this cache (1 valid bit + 27 tag bits + 32 data bits)*8 sets = 480 bits of SRAM.
However, don't let yourself be confused by thinking the tag is part of the data block. It can (and often is) located elsewhere on the chip, even though conceptually it is coupled with the data portion of the set.
I'd also like to add (and hopefully not further confuse the subject), while it is possible to build a cache as they have shown (both the tags, valid bit, and data in SRAM, which means it will be very dense), you may not actually want to build it that way! The data would be in SRAM, but I suspect it's more likely for the valid bits and tags to be located elsewhere in flip flops. Much faster to access! You should talk to your teacher about how caches are normally built and the trade offs in having the tags and valid bits in SRAM vs flipflops.