それを反復処理した後のHashSetの失敗から削除

https://stackoverflow.com/questions/754235

09-09-2019
|

質問

私はJavaで凝集クラスタリングアルゴリズムを書き込み、削除操作に問題を抱えています。クラスタの数が半分の初期数に達した場合、必ず失敗するようです。

以下のサンプルコードでは、clustersはCollection<Collection<Integer>>ある。

      while(clusters.size() > K){
           // determine smallest distance between clusters
           Collection<Integer> minclust1 = null;
           Collection<Integer> minclust2 = null;
           double mindist = Double.POSITIVE_INFINITY;

           for(Collection<Integer> cluster1 : clusters){
                for(Collection<Integer> cluster2 : clusters){
                     if( cluster1 != cluster2 && getDistance(cluster1, cluster2) < mindist){
                          minclust1 = cluster1;
                          minclust2 = cluster2;
                          mindist = getDistance(cluster1, cluster2);
                     }
                }
           }

           // merge the two clusters
           minclust1.addAll(minclust2);
           clusters.remove(minclust2);
      }

ループを通るいくつかの実行後、clusters.remove(minclust2)は、最終的にはfalseを返しますが、私は理由を理解していない。

Iは1〜10までの距離から1つの整数とそれぞれが0と1の間の乱数である。ここで出力が数のprintln文を追加した後だが、最初の10個のクラスターを作成することにより、このコードをテストしました。クラスタ数の後、私は、実際のクラスタ、マージ操作、及びclusters.remove（minclust2）の結果をプリントアウトします。

Clustering: 10 clusters
[[3], [1], [10], [5], [9], [7], [2], [4], [6], [8]]
[5] <- [6]
true
Clustering: 9 clusters
[[3], [1], [10], [5, 6], [9], [7], [2], [4], [8]]
[7] <- [8]
true
Clustering: 8 clusters
[[3], [1], [10], [5, 6], [9], [7, 8], [2], [4]]
[10] <- [9]
true
Clustering: 7 clusters
[[3], [1], [10, 9], [5, 6], [7, 8], [2], [4]]
[5, 6] <- [4]
true
Clustering: 6 clusters
[[3], [1], [10, 9], [5, 6, 4], [7, 8], [2]]
[3] <- [2]
true
Clustering: 5 clusters
[[3, 2], [1], [10, 9], [5, 6, 4], [7, 8]]
[10, 9] <- [5, 6, 4]
false
Clustering: 5 clusters
[[3, 2], [1], [10, 9, 5, 6, 4], [5, 6, 4], [7, 8]]
[10, 9, 5, 6, 4] <- [5, 6, 4]
false
Clustering: 5 clusters
[[3, 2], [1], [10, 9, 5, 6, 4, 5, 6, 4], [5, 6, 4], [7, 8]]
[10, 9, 5, 6, 4, 5, 6, 4] <- [5, 6, 4]
false

10、9、5、6、4、5、6、4、...]セットがそこから無限に成長します。

の編集：明確にするために、私は、クラスタ内の各クラスター（HashSet<Integer>ためHashSet<HashSet<Integer>>)を使用していますの

解決

ああ。あなたはSet（またはMapキー）に既に存在する値を変更すると、それは正しい位置に必ずしもなく、ハッシュコードがキャッシュされます。あなたは、それを削除し、それを変更し、それを再挿入する必要があります。

他のヒント

が示した試験では、removeは、複数の整数を含むコレクションを削除しようと最初に失敗しました。これは常にそうですか？

使用コレクションの具体的な種類は何ですか？

明白な問題はclusters.removeは、おそらく削除する要素を見つけることがequalsを使用していることがあります。残念ながら、コレクションのequalsは一般の要素が同じである、というよりも、それは同じコレクションであれば（私はC＃は、この点でより優れた選択肢となります信じて）かどうかを比較します。

簡単な修正はclustersとしてCollections.newSetFromMap(new IdentityHashMap<Collection<Integer>, Boolean>())を作成することです（と思う）。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow