point_i = i * vsegments + j
gives you:
i = point_i / vsegments
j = point_i % vsegments
Of course your loops actually do segments + 1
iterations each (indices 0
to segments
), so you would need to use vsegments + 1
instead of vsegments
As a side note: Do you actually need to merge the loops into one for multithreading? I would expect the outer loop to have typically enough iterations to saturate your availible cores anyways.