It is/was happening because there was a bug in the implementation, where the results array was not being initialised with the same dtype as the input array. I submitted a pull request to fix it which has been accepted so I suppose it will be fixed in scipy 1.9.
Why does `stats.mode()` function truncate the answer on an array of strings?
Вопрос
I'm trying to use scipy's stats.mode
function to get the most common string out of an array of strings. But the function is truncating the strings for some reason.
>>> a
array([' State-gov', ' Self-emp-not-inc', ' Private', ..., ' Private',
' Private', ' Self-emp-inc'],
dtype='|S27')
>>> stats.mode(a)
(array([' P'],
dtype='|S2'), array([ 22696.]))
(The answer should be ' Private'
.) Any ideas how I can get the full string? And why is this happening?
Решение 2
Другие советы
Not sure you can solve with sp.stats.mode()
- I have also encountered this weird behavior before.
For a non-scipy solution you can use collections.Counter
:
collections.Counter(a).most_common(1)
This will return a tuple with the string and its number of occurrences.
Не связан с StackOverflow