与えられた数のグループ内の数の頻度を見つける

https://stackoverflow.com/questions/145563

02-07-2019
|

質問

C ++でベクトル/配列があり、これらのN個の要素のどれが最大の繰り返し発生をカウントし、最高のカウントを出力するとします。このアルゴリズムは、このジョブに最適です。

例：

int a = { 2, 456, 34, 3456, 2, 435, 2, 456, 2}

2が4回発生するため、出力は4です。これは2が発生する最大回数です。

解決

配列をソートし、クイックパスを実行して各数値をカウントします。アルゴリズムにはO（N * logN）の複雑さがあります。

または、番号をキーとして使用してハッシュテーブルを作成します。キーを設定した各要素のカウンターをハッシュテーブルに保存します。 1回のパスですべての要素をカウントできます。ただし、アルゴリズムの複雑さは、所有する機能の複雑さに依存するようになりました。

他のヒント

スペース用に最適化：

クイックソート（たとえば）は、アイテムを繰り返し処理し、最大カウントのみを追跡します。せいぜいO（N log N）。

速度の最適化：

個別のカウントを追跡しながら、すべての要素を反復処理します。このアルゴリズムは常にO（n）です。

RAMがあり、値が大きすぎない場合は、カウントソートを使用します。

STLを利用する可能性のあるC ++実装は次のとおりです。

#include <iostream>
#include <algorithm>
#include <map>

// functor
struct maxoccur
{
    int _M_val;
    int _M_rep;

    maxoccur()
    : _M_val(0),
      _M_rep(0)
    {}

    void operator()(const std::pair<int,int> &e)
    {
        std::cout << "pair: " << e.first << " " << e.second << std::endl;
        if ( _M_rep < e.second ) {
            _M_val = e.first;
            _M_rep = e.second;
        }
    }
};

int
main(int argc, char *argv[])
{
    int a[] = {2,456,34,3456,2,435,2,456,2};
    std::map<int,int> m; 

    // load the map
    for(unsigned int i=0; i< sizeof(a)/sizeof(a[0]); i++) 
        m [a[i]]++;

    // find the max occurence...
    maxoccur ret = std::for_each(m.begin(), m.end(), maxoccur());
    std::cout << "value:" << ret._M_val << " max repetition:" << ret._M_rep <<  std::endl;

    return 0;
}

少しの擬似コード：

//split string into array firts
strsplit(numbers) //PHP function name to split a string into it's components
i=0
while( i < count(array))
 {
   if(isset(list[array[i]]))
    {
      list[array[i]]['count'] = list + 1
    }
   else
    {
      list[i]['count'] = 1
      list[i]['number']
    }
   i=i+1
 }
usort(list) //usort is a php function that sorts an array by its value not its key, Im assuming that you have something in c++ that does this
print list[0]['number'] //Should contain the most used number

ハッシュアルゴリズム（build count [i] = #occurrences（i）は基本的に線形時間）は非常に実用的ですが、プロセス中にハッシュの衝突が発生する可能性があるため、厳密にはO（n）ではありません。

この質問の興味深い特殊なケースは、多数のアルゴリズムです。そのような要素が存在する場合、少なくともn / 2の配列エントリに存在する要素を検索します。

簡単な説明、およびこれを行う方法の詳細ハッシュトリッキーのようなもののない線形時間。

要素の範囲が要素の数に比べて大きい場合、他の人が言ったように、ソートとスキャンだけを行います。これは、時間n * log nであり、追加のスペースはありません（log nが追加される可能性があります）。

カウントソートの問題は、値の範囲が大きい場合、ソートよりもカウント配列の初期化に時間がかかる可能性があることです。

std :: tr1 :: unordered_map を使用した完全なテスト済みバージョンです。

これをほぼO（n）にします。まず、n個の入力値を反復処理して unordered_map にカウントを挿入/更新し、次に partial_sort_copy （O（n））を実行します。 2 * O（n）〜= O（n）。

#include <unordered_map>
#include <vector>
#include <algorithm>
#include <iostream>

namespace {
// Only used in most_frequent but can't be a local class because of the member template
struct second_greater {
    // Need to compare two (slightly) different types of pairs
    template <typename PairA, typename PairB>
    bool operator() (const PairA& a, const PairB& b) const
        { return a.second > b.second; }
};
}

template <typename Iter>
std::pair<typename std::iterator_traits<Iter>::value_type, unsigned int>
most_frequent(Iter begin, Iter end)
{
    typedef typename std::iterator_traits<Iter>::value_type value_type;
    typedef std::pair<value_type, unsigned int> result_type;

    std::tr1::unordered_map<value_type, unsigned int> counts;

    for(; begin != end; ++begin)
        // This is safe because new entries in the map are defined to be initialized to 0 for
        // built-in numeric types - no need to initialize them first
        ++ counts[*begin];

    // Only need the top one at this point (could easily expand to top-n)
    std::vector<result_type> top(1);

    std::partial_sort_copy(counts.begin(), counts.end(),
                           top.begin(), top.end(), second_greater());

    return top.front();
}

int main(int argc, char* argv[])
{
    int a[] = { 2, 456, 34, 3456, 2, 435, 2, 456, 2 };

    std::pair<int, unsigned int> m = most_frequent(a, a + (sizeof(a) / sizeof(a[0])));

    std::cout << "most common = " << m.first << " (" << m.second << " instances)" << std::endl;
    assert(m.first == 2);
    assert(m.second == 4);

    return 0;
}

O（n）............になりますが、問題は大きいです。配列の同じサイズの別の配列を取得できます............

for（i = 0; i

mar = count [o]; index = o;

for（i = 0; i

その後、出力は.........要素 index が max なしで発生します。この配列内の時間........

ここで、a []は特定のnoの最大出現を検索する必要があるデータ配列です。配列内.......

count []各要素のカウントを持つ............. 注：データの範囲は配列になります。たとえばその配列内のデータの範囲は1から100 .......であり、100個の要素のcount配列が追跡されます（インデックス化された値が1で発生した場合）。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow