문제

While studying some basic sorting algorithms I noticed that in literature they always say that any sorting algorithm of the order of n² is more or less impractical or even unsuitable for big data sets.

Well, that seems legit, since every time the input data is doubled, the runtime of that algorithm increases fourfold. But... where is the border when a set of data to process is too big? Is there any general-purpose answer to that question?

도움이 되었습니까?

해결책

It's commonly accepted that, for anything more than about 10-20 elements, you'd use a more efficient sorting algorithm such as quick-sort or merge-sort.

There is no exact point where one becomes faster than the other, as it depends on the data and the specifications of the machine you're using.

And, it may take a while, but nothing's stopping you from using insertion sort to sort a few petabytes of data.

다른 팁

InsertionSort is more effective than faster lineartihmic algorithms on small data sets. I've seen suggestions as low as 10 to 20 in text books, but Tim-sort, which is considered very effective, is a hybrid of merge and insertion sort. It uses data sets up to a size of 63 elements for insertion sortable parts if I remember correctly.

Edit: You can find more information on Tim-sort at Wikipedia.

Tim-sort uses insertion sort alone on data sets that are 63 elements or smaller. For larger data sets it insertion sorts ranges of size 32-64 and merges these.

Sometimes, though, you just have to try different techniques and measure the results, or you may be limited by the data structure you have available. You will see the difference in time already at somewhere around 20-100 items if you measure it in ms scale, but it may not be noticeable to users. Very often, you have fast sorting algorithms implemented for you already in standard libraries for high-level languages.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top