hierarchical space partitioning structure would be needed, leaning towards an Octree (as height data would also be used). Is this a good choice?
Space partitioning is definitely a good idea to query the dots.
If id, x, y, z data for 4 million points were to be stored, I'm assuming this would be approximately 400-600MB as a CSV file
If you have 4 million points and 4 components of 4 Bytes each then the size of the data would be around 64 MB. This is manageable by a modern cpu/gpu.
is there any way to make this a reasonable size to send over the internet? Are compression algorithms really that good?
I think bandwidth is the main problem not size.
You can only send and display visible dots. If many dots are visible at once you can filter them by combining nearest points together in a hierarchy. Dots should exhibit high frequency and filtering, + possibly changing the brightness, should give perceptually smooth results.
Such concept is similar to texture clipmaps or geometry clipmaps.