Need to understand the functionality of string.count() function

https://stackoverflow.com/questions/11630104

22-06-2021
|

Pregunta

I was trying to understand the functioning of string.count basically The documented definition of it is

string.count(s, sub[, start[, end]])
Return the number of (non-overlapping) occurrences of substring sub in string s[start:end]. Defaults for start and end and interpretation of negative values are the same as for slices.

Now what's the exact meaning of last line here which is bold actually. I did some random experiment with this function

In [19]: a
Out[19]: 'ab'
In [23]: string.count(a,'ab'[1:3])
Out[23]: 1

In [24]: string.count(a,'ab'[1:1])
Out[24]: 3

In [25]: string.count(a,'ab'[-1:1])
Out[25]: 3

In [26]: string.count(a,'ab'[-1:0])
Out[26]: 3

In [27]: string.count(a,'ab'[1:4])
Out[27]: 1

In [28]: string.count(a,'ab'[1:100])
Out[28]: 1

In [29]: string.count(a,'ab'[100:100])
Out[29]: 3

In [30]: string.count(a,'ab'[:])
Out[30]: 1

In [31]: string.count(a,'a'[:])
Out[31]: 1

Can anybody explain me why some times I am getting result as 1 and why sometimes 3 here. Overall I need to understand how this function works exactly ?

Solución

Everytime you see a 3 as result, the second parameter was the empty string – you hide this facte by using strange slicings. The empty string is considered to occur at the positions 0:0, 1:1 and 2:2 for this example, so there are three occurrences. In general, string.count(s, "") (or equivalently, s.count("")) will return len(s) + 1.

To explain how this result comes about, here is a (not very efficient) example implementation of string.count():

def count(s, sub):
    result = 0
    for i in range(len(s) + 1 - len(sub)):
        result += (s[i:i + len(sub)] == sub)
    return result

Otros consejos

I think your error lies in misunderstanding the notation that the documentation uses.

string.count(s, sub[, start[, end]]) Return the number of (non-overlapping) occurrences of substring sub in string s[start:end]. Defaults for start and end and interpretation of negative values are the same as for slices.

The square brackets [] denote optional arguments. It is a short form for providing three different method signatures, they are not part of the actual syntax to invoke the method:

string.count(s, sub)
string.count(s, sub, start)
string.count(s, sub, start, end)

Defaults for start and end and interpretation of negative values are the same as for slices.

This is slicing:

>>> "0123_3210"[:3] # from the first until the 3rd item, excluding it
'012'
>>> "0123_3210"[-3:] # from the 3rd-to-last until the end
'210'
>>> "0123_3210"[-3:-1] # from the 3rd-to-last until the end, excluding one item
'21'
>>> "0123_3210"[1:3] # from the second item until the 3rd, excluding it
'12

As you can see, when you use negative indexes it starts counting from the right end.

What is says is, that these are equivalent for both positive and negative of start and end respectively.

string.count(s[:], sub) ≡ string.count(s, sub)
string.count(s[start:], sub) ≡ string.count(s, sub, start)
string.count(s[start:end], sub) ≡ string.count(s, sub, start,end)

It seems your confusion comes either from not being certain of how to use the slice notation, or not being clear on the behavior of string.count() with respect to the empty string.

I think you would be well served by reading http://docs.python.org/tutorial/introduction.html#strings.

In [19]: a 
Out[19]: 'ab'

Okay.

In [23]: string.count(a,'ab'[1:3]) 
Out[23]: 1

'ab'[1:3] == 'b'. There is one instance of 'b' in 'ab'.

In [24]: string.count(a,'ab'[1:1]) 
Out[24]: 3

'ab'[1:1] == ''.

The count of '' in any string in python is equal to len(string)+1 UNLESS the string length is greater than INT_MAX (I believe), in which case it returns INT_MAX.

You can see this in the Python source code here: http://svn.python.org/view/checkout/python/trunk/Objects/stringlib/count.h?content-type=text%2Fplain.

In [25]: string.count(a,'ab'[-1:1]) 
Out[25]: 3

'ab'[1:1] == ''. Once again, you are taking the count of ''.

In [26]: string.count(a,'ab'[-1:0]) 
Out[26]: 3

'ab'[1:1] == '', again.

In [27]: string.count(a,'ab'[1:4]) 
Out[27]: 1

'ab'[1:4] == 'b'.

In [28]: string.count(a,'ab'[1:100]) 
Out[28]: 1

'ab'[1:100] == 'b'.

In [29]: string.count(a,'ab'[100:100]) 
Out[29]: 3

'ab'[100:100] == ''.

In [30]: string.count(a,'ab'[:]) 
Out[30]: 1

'ab'[:] == 'ab'. There is one occurence of 'ab' in 'ab'.

In [31]: string.count(a,'a'[:]) 
Out[31]: 1

In this case 'a'[:] == 'a'. There is one occurences of 'a' in 'ab'.

Clearer?

[...] interpretation of negative values are the same as for slices

is for the third and forth optional arguments of the string.count(...) function.

In [1]: import string

In [2]: s = 'hello world'

In [3]: string.count(s, 'o')
Out[3]: 2

In [4]: string.count(s, 'o', 5, 9) # count from 6th to 10th characters
Out[4]: 1

In [5]: string.count(s, 'o', 5, 9) == string.count(s[5:9], 'o')
Out[5]: True

In [6]: string.count(s, 'o', -9, -5) # count from 3rd to 6th characters
Out[6]: 1

This is very different to search the empty string:

In [7]: string.count(s, 'o'[5:9]) == string.count(s, '') == (2 + (len(s) - 1))
Out[7]: True

(There are "empty strings" at the beginning, at the end and between each characters of s)

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow