You have multiple options, as you've discovered. The easiest to think about is to use the centroid of the bounding box. You could also sort by the min coordinate and separately by the max coordinate.
You can sort by each axis as a preprocess. Then each iteration is a partitioning that involves choosing which axis to split on and where to split, and storing the start/end points in that axis's sorted list.
See Ingo Wald's paper: http://www.sci.utah.edu/~wald/Publications/2007/FastBuild/download/fastbuild.pdf
The faster, lower quality approach is a linearized BVH - map each centroid to a coordinate on a space filling curve, such as a Morton code, then sort all the triangles by their Morton code, then split spans into boxes.