如何排序几乎排列在尽可能最快的时间?(Java)

https://stackoverflow.com/questions/1390832

21-09-2019
|

题

我有一系列值这几乎是，但不完全排序，有几个值流离失所者(说，50 100000).怎么排序最有效？(性是绝对至关重要，并应更快的方式比O(N))。

我知道smoothsort，但是我不能找到Java执行情况。任何人都不会知道它是否已经实施？或者我可以用于这项任务，而不是smoothsort?

解决方案

其实，维基百科包含的Java实现smoothsort的。你可以在这里找到它：

http://en.wikipedia.org/wiki/Smoothsort 。

其他提示

鸡尾酒排序

如果你想有一个简单的算法，很容易实现，你可以做一个鸡尾酒排序。这将很好地工作在近排序的输入。

作为Botz3000指出，可以不执行这样的一个运行速度快于O(N)。最基本的元素的任何算法，将是找到这些条目列出的顺序。这需要O(N)，甚至在你出什么做它们。

如果确实数量的"出的"要素是数量级以下的总人数的元素，你可以用下面的算法(假设的链接表):

找到所有的项目和提取，从原来的单独列表，O(N)
结果是两名单：排序的清单和简短的提取的列表
对于每个提取的元素，它们插入排序的清单。这将是O(日志(N))的每一个，总是O(Xlog(N))，其中X是的数量提取的元素。如果X是非常小，相对于N,你结束了一个总的O(N)。

有许多很好的算法此。

Smoothsort是我个人最喜欢的......我的实际工作所有的数学出这里如果你好奇为什么这么好。

一个相当好的算法已排序的数据是天然归并，这是归并的自底向上的版本作品通过处理输入作为排序的子范围的序列，然后进行多次经过合并相邻的有序区间的范围内。它运行在O（n）的时间，如果数据已经排序（因为它可以检测到只有一个排序的范围），以及为O（n LG N）在最坏的情况下。该算法工作，如果该数据是“块排序”相当好;即，它由放置旁边彼此很多排序的块组成。

直插入排序绝对很适合大部分排序的数据，但可以很严重退化了很多的投入。一些很好的排序（类似的内省排序的）实际使用插入这个属性排序做就输入“清理的一步。”

[太阳] JDK7具有（或将具有）添的实现排序（从蟒蛇）。这是一个合并排序，它利用顺序阵列中已经存在的优势。

Smoothsort或Timsort是巨大的算法，并且将是合理的东西使用。

我想补充一点，你可能不知道的是，不起眼的插入排序是什么自适应。事实上，真的差点排序的列表，你似乎有，我的理解（我不能与参考备份）是它比更复杂的算法更快。的问题是，如果输入几乎不排序，它快速降解为O（n ^ 2）。不过，这是非常简单的正确实施，因此，如果您肯定知道你的输入总是几乎排序，这将是一个不错的选择。

只要把它放在桌子上，一个很好的实现冒泡排序肯定是最简单的算法在这里。用O（n * m个）的最坏情况下，m是移位的数目。 m个部分在很大程度上取决于位移的模式，通常总复杂将是O（n）中。

您是对大约达到O（N），但假设的多芯机（我有）是不可能的，我们可以通过使用并行排序算法欺骗一点。

执行我们在学校里的壳牌的排序的调用。这是bubblesorting子阵列。一个子阵列与步骤k是元件的阵列是序号为0，K，2K，3K ...

如果您选择K = 3I + 1，和执行多个气泡排序，从更高的I-S DOWNTO 0时，时间将在几乎-排序后的数组较小开始。

这是原始Java实现Smoothsort的曾经是通过维基百科文章可用。

// by keeping these constants, we can avoid the tiresome business
// of keeping track of Dijkstra's b and c. Instead of keeping
// b and c, I will keep an index into this array.

static final int LP[] = { 1, 1, 3, 5, 9, 15, 25, 41, 67, 109,
    177, 287, 465, 753, 1219, 1973, 3193, 5167, 8361, 13529, 21891,
    35421, 57313, 92735, 150049, 242785, 392835, 635621, 1028457,
    1664079, 2692537, 4356617, 7049155, 11405773, 18454929, 29860703,
    48315633, 78176337, 126491971, 204668309, 331160281, 535828591,
    866988873 // the next number is > 31 bits.
};

public static <C extends Comparable<? super C>> void sort(C[] m,
    int lo, int hi) {
  int head = lo; // the offset of the first element of the prefix into m

  // These variables need a little explaining. If our string of heaps
  // is of length 38, then the heaps will be of size 25+9+3+1, which are
  // Leonardo numbers 6, 4, 2, 1. 
  // Turning this into a binary number, we get b01010110 = 0x56. We represent
  // this number as a pair of numbers by right-shifting all the zeros and 
  // storing the mantissa and exponent as "p" and "pshift".
  // This is handy, because the exponent is the index into L[] giving the
  // size of the rightmost heap, and because we can instantly find out if
  // the rightmost two heaps are consecutive Leonardo numbers by checking
  // (p&3)==3

  int p = 1; // the bitmap of the current standard concatenation >> pshift
  int pshift = 1;

  while (head < hi) {
    if ((p & 3) == 3) {
      // Add 1 by merging the first two blocks into a larger one.
      // The next Leonardo number is one bigger.
      sift(m, pshift, head);
      p >>>= 2;
      pshift += 2;
    } else {
      // adding a new block of length 1
      if (LP[pshift - 1] >= hi - head) {
        // this block is its final size.
        trinkle(m, p, pshift, head, false);
      } else {
        // this block will get merged. Just make it trusty.
        sift(m, pshift, head);
      }

      if (pshift == 1) {
        // LP[1] is being used, so we add use LP[0]
        p <<= 1;
        pshift--;
      } else {
        // shift out to position 1, add LP[1]
        p <<= (pshift - 1);
        pshift = 1;
      }
    }
    p |= 1;
    head++;
  }

  trinkle(m, p, pshift, head, false);

  while (pshift != 1 || p != 1) {
    if (pshift <= 1) {
      // block of length 1. No fiddling needed
      int trail = Integer.numberOfTrailingZeros(p & ~1);
      p >>>= trail;
      pshift += trail;
    } else {
      p <<= 2;
      p ^= 7;
      pshift -= 2;

      // This block gets broken into three bits. The rightmost
      // bit is a block of length 1. The left hand part is split into
      // two, a block of length LP[pshift+1] and one of LP[pshift].
      // Both these two are appropriately heapified, but the root
      // nodes are not necessarily in order. We therefore semitrinkle
      // both of them

      trinkle(m, p >>> 1, pshift + 1, head - LP[pshift] - 1, true);
      trinkle(m, p, pshift, head - 1, true);
    }

    head--;
  }
}

private static <C extends Comparable<? super C>> void sift(C[] m, int pshift,
    int head) {
  // we do not use Floyd's improvements to the heapsort sift, because we
  // are not doing what heapsort does - always moving nodes from near
  // the bottom of the tree to the root.

  C val = m[head];

  while (pshift > 1) {
    int rt = head - 1;
    int lf = head - 1 - LP[pshift - 2];

    if (val.compareTo(m[lf]) >= 0 && val.compareTo(m[rt]) >= 0)
      break;
    if (m[lf].compareTo(m[rt]) >= 0) {
      m[head] = m[lf];
      head = lf;
      pshift -= 1;
    } else {
      m[head] = m[rt];
      head = rt;
      pshift -= 2;
    }
  }

  m[head] = val;
}

private static <C extends Comparable<? super C>> void trinkle(C[] m, int p,
    int pshift, int head, boolean isTrusty) {

  C val = m[head];

  while (p != 1) {
    int stepson = head - LP[pshift];

    if (m[stepson].compareTo(val) <= 0)
      break; // current node is greater than head. Sift.

    // no need to check this if we know the current node is trusty,
    // because we just checked the head (which is val, in the first
    // iteration)
    if (!isTrusty && pshift > 1) {
      int rt = head - 1;
      int lf = head - 1 - LP[pshift - 2];
      if (m[rt].compareTo(m[stepson]) >= 0
          || m[lf].compareTo(m[stepson]) >= 0)
        break;
    }

    m[head] = m[stepson];

    head = stepson;
    int trail = Integer.numberOfTrailingZeros(p & ~1);
    p >>>= trail;
    pshift += trail;
    isTrusty = false;
  }

  if (!isTrusty) {
    m[head] = val;
    sift(m, pshift, head);
  }
}

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow