Pregunta

I have a resultSet having 1605 records. When I add the resultSet to an arrayList, the size I get is 1605 while when I add the resultSet to a HashSet datastructure, the size it prints is 1598. I have no idea why there is this discrepancy.

    Set<String> list_of_genes_strain_1 = new HashSet<>();
    ArrayList<String> list_of_genes = new ArrayList<>();
    // Loop through result sets
    while(gene_strain_1.next()){
      String gene_name = gene_strain_1.getString(1);
      list_of_genes_strain_1.add(gene_name); // add to set
      list_of_genes.add(gene_name); // add to arrayList
     }
    System.out.println("list_of_genes for strain 1: " + list_of_genes.size());
    System.out.println("SET genes for strain 1 :" + list_of_genes_strain_1.size());

The output I get is this:

    list_of_genes for strain 1: 1605
    SET genes for strain 1 :1598
¿Fue útil?

Solución

HashSet is a Set which means it does not allow duplicates to be stored. That is the definition of Sets in java. So your list probably has duplicates which are removed when added to a HashSet and hence the difference.

Below is the definition of a Set as defined in the java docs. Please have a look at it for more information.

A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.

Otros consejos

Set excludes duplicates thus the Set contains unique ones only.

Thats probably because you have few repeated words in your resultSet.

Lets say you have

One
Two
Three
One

with is 4 items. In array you'll have 4 items but in hashmap you will have 3 because it holds value via key/value. If it has One key already, it will replace its previous value.

In a HashSet you can note have a duplicate item (in this case Strings with the same content), all the items contained are unique, while in a List you can have more than one String with the same value.

For sure in the result set gene_strain_1 there are duplicate items. Therefore they will be added to to Set only at the first occurrence (and not at the second, third, etc.) while they will be always added to the List. In fact the List contains the same amount of items as the original (1605), while the HashSet less (only 1598).

What sounds weird is that you say that your object gene_strain_1 is a resultSet, i.e. already a Set that it should not originally contain duplicate items.

Are you sure that gene_strain_1 is a Set?

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top