Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces?
Not necceseraly distinct - but yes, they gave 4916 different photos of faces. The faces were found manually by a "human expert".
Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]?
They only used a grey-scale pixels, normalized means they made sure there is no "black" and "white" pictures. If a picture was very dark - it was automatically brightened, and if it was not dark enough - it was darkened. This is done by an automatic component easily.
Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?
Yes, they made sure each "face" is exactly 24x24 pixels by applying some processing on the picture.