Pergunta

Is there a use case for the size() method on the java.util.BitSet class?

I mean - the JavaDoc clearly says it's implementation dependant, it returns the size of the internal long[] storage in bits. From what it says, one could conclude that you won't be able to set a bit with a higher index than size(), but that's not true, the BitSet can grow automatically:

BitSet myBitSet = new BitSet();
System.out.println(myBitSet.size());    // prints "64"
myBitSet.set(768);
System.out.println(myBitSet.size());    // prints "832"

In every single encounter with BitSet I have had in my life, I always wanted to use length() since that one returns the logical size of the BitSet:

BitSet myBitSet = new BitSet();
System.out.println(myBitSet.length());    // prints "0"
myBitSet.set(768);
System.out.println(myBitSet.length());    // prints "769"

Even though I have been programming Java for the last 6 years, the two methods are always highly confusing for me. I often mix them up and use the wrong one incidentally, because in my head, I think of BitSet as a clever Set<boolean> where I'd use size().

It's like if ArrayList had length() returning the number of elements and size() returning the size of the underlying array.

Now, is there any use case for the size() method I am missing? Is it useful in any way? Has anyone ever used it for anything? Might it be important for some manual bit twiddling or something similar?


EDIT (after some more research)

I realized BitSet was introduced in Java 1.0 while the Collections framework with most of the classes we use was introduced in Java 1.2. So basically it seems to me that size() is kept because of legacy reasons and there's no real use for it. The new Collection classes don't have such methods, while some of the old ones (Vector, for example) do.

Foi útil?

Solução

I realized BitSet was introduced in Java 1.0 while the Collections framework with most of the classes we use was introduced in Java 1.2.

Correct.

So basically it seems to me that size() is kept because of legacy reasons and there's no real use for it.

Yes, pretty much.

The other "size" method is length() which gives you the largest index at which a bit is set. From a logical perspective, length() is more useful than size() ... but length() was only introduced in Java 1.2.

The only (hypothetical) use-case I can think of where size() might be better than length() is when:

  • you are trying to establish a "fence post" for an iteration of the bits in the set, and
  • it is highly likely that you will stop iterating well before the end, and
  • it doesn't matter is you go a little bit beyond the last bit that is set.

In that case, size() is arguably better than length() because it is a cheaper call. (Look at the source code ...) But that's pretty marginal.

(I guess, another use-case along similar lines is when you are creating a new BitSet and preallocating it based on the size() of an existing BitSet. Again, the difference is marginal.)

But you are right about compatibility. It is clear that they could not either get rid of size() or change its semantics without creating compatibility problems. So they presumably decided to leave it alone. (Indeed, they didn't even see the need to deprecate it. The "harm" in having a not-particularly-useful method in the API is minimal.)

Outras dicas

If the size method wasn't designed by Java creators as public, it would still undoubtedly exist as a private method/field. So we are discussing its accessibility and maybe naming.

Java 1.0 took a lot of inspiration, not just the procedural syntax, from C/C++. In the C++ standard library, the counterparts to BitSet's length and size also exist. They are called there size and capacity, respectively. There is rarely any hard reason to use capacity in C++, and even less so in a garbage collected language such as Java, but having the method accessible is still arguably useful. I will explain in Java terms.

Tell me, what is the maximum number of machine instructions ever needed for executing a BitSet operation such as set? One would like to answer "just a handful", but this is only true if that particular operation does not result in reallocation of the whole underlying array. Theoretically, the reallocations turn a constant time algorithm into a linear time one.

Does this theoretical difference have much practical impact? Rarely. The array usually doesn't grow too often. However, whenever you have an algorithm operating over a gradually growing BitSet with an approximately known final size, you will save on reallocations if you pass the final size already to the BitSet's constructor. In some very special circumstances this may even have a noticeable effect, in most circumstances it does not hurt.

  • set then has constant time complexity - calling it cannot ever block the application for too long.
  • if just one extremely large BitSet instance is using up all your available memory (by design), swapping may start noticeably later dependending on how your JVM implements the growth operation (with or without an extra copy).

Now imagine that you operate on many BitSets, all of which have been allocated with a target size. You are constructing one BitSet instance from another and you want the new one share the old one's target size as you know you will be using them side by side. Having the size method public makes this easier to implement cleanly.

It is the number of 0 and 1s which has to be a multiple of 64. You could use the cardinality() for the number of 1s.

One of the main reason i think it may be useful is when we need to extend the BitSet class and override the length method. In that case, the size is useful. below is how length returns value with dependancy on size method.

protected Set bitset;
public int length() {
  int returnValue = 0;
  // Make sure set not empty
  // Get maximum value +1
  if (bitset.size() > 0) {
     Integer max = (Integer)Collections.max(bitset);
     returnValue = max.intValue()+1;
  }
  return returnValue;
 }
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top