Your code is fine.
The vectors are dominated by several large features. In those features, the two vectors are almost collinear, which is why the similarity measure is close to 1
.
I include the six largest features below. Look at the ratio of vec2
over vec1
: it's almost identical across those features.
feature vec1 vec2 vec2/vec1
64806110 2875 1.85E+07 6.43E+03
64806108 5750 3.68E+07 6.40E+03
64806107 8625 5.49E+07 6.37E+03
64806106 11500 7.29E+07 6.34E+03
64806111 14375 9.07E+07 6.31E+03
64806109 17250 1.08E+08 6.28E+03