Java String substring() and StringBuilder delete() methods

https://softwareengineering.stackexchange.com/questions/378417

07-02-2021
|

Pergunta

I've noticed that some methods like the String's substring(int beginIndex, int endIndex) and StringBuilder's delete(int beginIndex, int endIndex), use the second parameter to signify that the substring or deletion should go to endIndex-1 and not endIndex. Is there a reason for this? It doesn't seem, at least to me, to make logical sense for these methods to indicate the method stops before the parameter instead of at the parameter unlike some other methods, in other classes. Some example snippets would be:

Example 1:

4: StringBuilder sb = new StringBuilder("abcdef");
5: sb.delete(2, 4);
6: System.out.println(sb);
//This would print abef instead of abf

Example 2:

4: String str = "abcdef";
5: String newStr = str.substring(2, 4);
6: System.out.println(newStr);
//This would print cd instead of cde

Does this seem strange to anyone else considering how some other methods with index parameters work? Is there a reason for this? If so, please explain it to me.

Edit to differentiate and clarify unique question: This question is different from the discussion on solely substring() as it is about all methods that use indexcies to indicate that the method should stop before the provided endIndex (exclusion instead of inclusion).

Solução

A practical reason is that it makes things easier in some common situations:

If you want the operation to include everything until the end of the string, you can directly use the length as endIndex
If you have "separation characters", like the dot between base name and filetype suffix, you typically don't want to include them, so you can use the index of the separation character as endIndex, e.g. String basename = filename.substring(0, filename.indexOf('.'))

Outras dicas

I agree with Micheal Borgwardt's answer but there is an opportunity for elaboration. The approach you see here is not only useful but also consistent with most the APIs you will find in Java and many other languages. The association with 0-based indexing is more clear if you think about the standard old-style for-loop: for (int i = 0; i < end; i++). These days we tend to favor for-each style constructs but this is how a lot of code is written. Note the while condition, it's based on and exclusive end with a < instead of a <=. When you are writing a lot of loops like this, it's helpful to use a similar construct each time.

But to really understand the bigger picture on this, I think it helps to look at a different kind of problem: time intervals. Let's imagine I want to tell if something happened today, in the morning. If I structure this as an inclusive end, I then have a problem of figuring out what the last time possibly could be. That means I need to know the smallest resolution of the clock. Is it seconds, milliseconds, microseconds, nanoseconds? Let's say I think seconds is good enough. So I say 'less than or equal to 11:59'. Does that work if it happened at 11:59 and 30 seconds? Well, I'll check whether "the minute of the day was less than or equal to 11:59". Or I could just say "less than noon" which is a lot more elegant and tends to be more robust.

Or consider date periods. How do I know if two date periods are adjacent i.e. that there is no gap in between? If I used an exclusive end, it's trivial. I just check that the (exclusive) end on the first period is equal to or after the (inclusive) start of the second. If I use an inclusive end, now I have to get out a calendar and figure out if the day after September 30th is October 1st. And does March 1st follow Feb 28? What year is it? Divide that by 4 unless we are talking about the last year of the century but don't forget that the year 2000 is the exception to the exception. There are strategies to solve that kind of thing but the easiest is to convert the inclusive end to exclusive.

The SCJP book depicts the following:

"The start index of the substring to be removed is defined by the first argument(which is zero-based), and the ending index of the substring to be removed is defined by the second argument (but it is one-based)!".

StringBuilder sb = new StringBuilder("0123456789")
System.out.println(sb.delete(4,6))

The previous example outputs 01236789 [0123XX6789] starting the deletion from index four (4) and being six (6) the stop index (not inclusive). In C# for example the Remove method of StringBuilder uses the start index and the number of characters to be deleted Remove(int startIndex, int length) as function arguments. Thus the stop-deletion index is the key to understand the delete method of Java StringBuilder class.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange