ほぼソートされた配列を可能な限り最速でソートするにはどうすればよいですか?(ジャワ)

https://stackoverflow.com/questions/1390832

21-09-2019
|

質問

ほぼソートされているものの、完全にはソートされていない値の配列があり、いくつかの値が置き換えられています（たとえば、100000 分の 50）。最も効率的に並べ替えるにはどうすればよいでしょうか?(ここではパフォーマンスが非常に重要であり、O(N) よりもはるかに高速である必要があります)。

スムーズソートについては知っていますが、Java 実装が見つかりません。すでに実装されているかどうか知っている人はいますか?または、スムーズソートの代わりにこのタスクに何を使用できますか?

解決

実際に、ウィキペディアはsmoothsortのJava実装が含まれています。あなたはここでそれを見つけることができます：

http://en.wikipedia.org/wiki/Smoothsortするます。

他のヒント

カクテルソート

あなたが実装するのは簡単です単純なアルゴリズムを使用する場合は、

は、あなたがシェーカーソートを行うことができます。これは、ほぼ、ソートされた入力の上、合理的にうまくいくでしょう。

Botz3000 が指摘したように、このような操作を O(N) より速く実行することはできません。アルゴリズムの最も基本的な要素は、配列内で順序が乱れているエントリを見つけることです。これには、どうすればよいかを理解する前であっても、O(N) が必要です。

実際に「順序が正しくない」要素の数が要素の総数より桁違いに少ない場合は、次のアルゴリズムを使用できます (リンクリストを想定)。

順序が乱れている項目をすべて検索し、元のリストから別のリストに抽出します。O(N)
結果は 2 つのリストになります。ソートされたリストと短い抽出されたリスト
抽出された各要素を並べ替えたリストに挿入します。それぞれ O(log(N)) となり、合計は O(Xlog(N)) になります。ここで、X は抽出された要素の数です。X が N に比べて非常に小さい場合、合計は O(N) になります。

多くの良いアルゴリズムが、このためにあります。

Smoothsortは、あなたがそれはとても好奇心に動作しているなぜならば、私は実際ににここからすべての数学を働いていた...私の個人的な好みです十分ます。

すでにソートされたデータのためのAかなり良いアルゴリズムがソートされたサブレンジのシーケンスとして入力を処理することにより作品、そして作る複数の上を通過することをマージソートのボトムアップ版であるの自然のマージの、あります隣接するソートされた範囲をマージする範囲。それは（それが唯一のソート範囲があることを検出することができますので）データがすでにソートされている場合はO（n）時間で実行され、最悪の場合はO（n LG N）。データは「ブロックがソート」である場合には、このアルゴリズムは非常によく動作します。それはそれはお互いに隣に置かれ、ソートブロックの多くで構成されている。

ストレート挿入ソート間違いなく、主に、ソートされたデータに適していますが、入力の多くに非常にひどく退化することができます。（のイントロソートのような）いくつかの本当に良いの種類は、実際には入力の上、「クリーンアップ手順」を行うには挿入ソートのこのプロパティを使用します。

は、[日] JDK7から（ソートするの実装を有する（又は有するであろう）パイソン）。それはすでに、アレイ内の既存のオーダーを利用してマージソートです。

SmoothsortまたはTimsortは偉大なアルゴリズムであり、使用するための賢明なものになります。

私は何を実現しない可能性がありますが、その謙虚な挿入ソートするであることを追加したいです適応。あなたが持っているように見えるとして実際に、実際にはほとんどソートされたリストのために、（私は参照してバックアップすることはできません）私の理解では、それはより速く、より洗練されたアルゴリズムよりもあるということです。問題は、入力がほとんどソートされていない場合、それは急速に劣化O（N ^ 2）ということです。それでも、それはあなたがあなたの入力は常にほとんどソートされていることを確認するために知っていればそう、それは良い選択になり、正しく実装するのは非常に簡単です。

だけがテーブルの上に置くために、うまく実装バブルソートは確かにここでは、もっとも簡単なアルゴリズムになります。 O（N * M）のワーストケースで、mは変位の数です。 M部分は、通常、総複雑性はO（N）になり、変位のパターンに大きく依存します。

あなたは右のO（N）に達するが、（私は）マルチコアマシンを想定しての不可能について、我々はアルゴリズムを並列ソートを使用して少しカンニングすることができますされます。

私たちはのシェルの並べ替えの学校に呼び出されるものを実装します。それはサブアレイをbubblesortingています。ステップkのサブアレイはなインデックス0、K、2Kと素子のアレイである、3K ...

あなたはK = 3I + 1を選択し、複数のバブルソートを実行した場合、0とdownto I-S高から始め、時間はほぼ、ソートされた配列に小さくなります。

これはウィキペディアの記事を介して利用可能にするために使用Smoothsortの元のJava実装です。

// by keeping these constants, we can avoid the tiresome business
// of keeping track of Dijkstra's b and c. Instead of keeping
// b and c, I will keep an index into this array.

static final int LP[] = { 1, 1, 3, 5, 9, 15, 25, 41, 67, 109,
    177, 287, 465, 753, 1219, 1973, 3193, 5167, 8361, 13529, 21891,
    35421, 57313, 92735, 150049, 242785, 392835, 635621, 1028457,
    1664079, 2692537, 4356617, 7049155, 11405773, 18454929, 29860703,
    48315633, 78176337, 126491971, 204668309, 331160281, 535828591,
    866988873 // the next number is > 31 bits.
};

public static <C extends Comparable<? super C>> void sort(C[] m,
    int lo, int hi) {
  int head = lo; // the offset of the first element of the prefix into m

  // These variables need a little explaining. If our string of heaps
  // is of length 38, then the heaps will be of size 25+9+3+1, which are
  // Leonardo numbers 6, 4, 2, 1. 
  // Turning this into a binary number, we get b01010110 = 0x56. We represent
  // this number as a pair of numbers by right-shifting all the zeros and 
  // storing the mantissa and exponent as "p" and "pshift".
  // This is handy, because the exponent is the index into L[] giving the
  // size of the rightmost heap, and because we can instantly find out if
  // the rightmost two heaps are consecutive Leonardo numbers by checking
  // (p&3)==3

  int p = 1; // the bitmap of the current standard concatenation >> pshift
  int pshift = 1;

  while (head < hi) {
    if ((p & 3) == 3) {
      // Add 1 by merging the first two blocks into a larger one.
      // The next Leonardo number is one bigger.
      sift(m, pshift, head);
      p >>>= 2;
      pshift += 2;
    } else {
      // adding a new block of length 1
      if (LP[pshift - 1] >= hi - head) {
        // this block is its final size.
        trinkle(m, p, pshift, head, false);
      } else {
        // this block will get merged. Just make it trusty.
        sift(m, pshift, head);
      }

      if (pshift == 1) {
        // LP[1] is being used, so we add use LP[0]
        p <<= 1;
        pshift--;
      } else {
        // shift out to position 1, add LP[1]
        p <<= (pshift - 1);
        pshift = 1;
      }
    }
    p |= 1;
    head++;
  }

  trinkle(m, p, pshift, head, false);

  while (pshift != 1 || p != 1) {
    if (pshift <= 1) {
      // block of length 1. No fiddling needed
      int trail = Integer.numberOfTrailingZeros(p & ~1);
      p >>>= trail;
      pshift += trail;
    } else {
      p <<= 2;
      p ^= 7;
      pshift -= 2;

      // This block gets broken into three bits. The rightmost
      // bit is a block of length 1. The left hand part is split into
      // two, a block of length LP[pshift+1] and one of LP[pshift].
      // Both these two are appropriately heapified, but the root
      // nodes are not necessarily in order. We therefore semitrinkle
      // both of them

      trinkle(m, p >>> 1, pshift + 1, head - LP[pshift] - 1, true);
      trinkle(m, p, pshift, head - 1, true);
    }

    head--;
  }
}

private static <C extends Comparable<? super C>> void sift(C[] m, int pshift,
    int head) {
  // we do not use Floyd's improvements to the heapsort sift, because we
  // are not doing what heapsort does - always moving nodes from near
  // the bottom of the tree to the root.

  C val = m[head];

  while (pshift > 1) {
    int rt = head - 1;
    int lf = head - 1 - LP[pshift - 2];

    if (val.compareTo(m[lf]) >= 0 && val.compareTo(m[rt]) >= 0)
      break;
    if (m[lf].compareTo(m[rt]) >= 0) {
      m[head] = m[lf];
      head = lf;
      pshift -= 1;
    } else {
      m[head] = m[rt];
      head = rt;
      pshift -= 2;
    }
  }

  m[head] = val;
}

private static <C extends Comparable<? super C>> void trinkle(C[] m, int p,
    int pshift, int head, boolean isTrusty) {

  C val = m[head];

  while (p != 1) {
    int stepson = head - LP[pshift];

    if (m[stepson].compareTo(val) <= 0)
      break; // current node is greater than head. Sift.

    // no need to check this if we know the current node is trusty,
    // because we just checked the head (which is val, in the first
    // iteration)
    if (!isTrusty && pshift > 1) {
      int rt = head - 1;
      int lf = head - 1 - LP[pshift - 2];
      if (m[rt].compareTo(m[stepson]) >= 0
          || m[lf].compareTo(m[stepson]) >= 0)
        break;
    }

    m[head] = m[stepson];

    head = stepson;
    int trail = Integer.numberOfTrailingZeros(p & ~1);
    p >>>= trail;
    pshift += trail;
    isTrusty = false;
  }

  if (!isTrusty) {
    m[head] = val;
    sift(m, pshift, head);
  }
}

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow