As you wrote, a vertex is the whole vector of position, normal, texture coordinates, etc.
If only one of those attributes differs, it's a completely different vertex.
but this approach seems very wasteful.
Different attribute, different vertex. That's not a hard to understand concept. Modern GPUs employ a vertex cache that uses the vertex attributes as a key into the cached output of the vertex transform stage. If attribute sharing was possible, this cache couldn't be implemented in an efficient way.
Are there any better approaches that would let me in effect share vertex positions without sharing normals?
Why would you want to do that? A vertex with a different normal is a different vertex. Trying to save a little bit of memory would open a huge can of worms. Not only for the cache, but also for other parts of the program.
Having separate vertices with separate normals is actually a good thing.