When i becomes large enough, you will be reading the output of another work-group. Nothing in the OpenCL execution model guarantees this other work-group will have finished execution.
In general it will not be the case, and you will read a partial sum, getting lower values than expected at the end.