It is/was happening because there was a bug in the implementation, where the results array was not being initialised with the same dtype as the input array. I submitted a pull request to fix it which has been accepted so I suppose it will be fixed in scipy 1.9.
Why does `stats.mode()` function truncate the answer on an array of strings?
Question
I'm trying to use scipy's stats.mode
function to get the most common string out of an array of strings. But the function is truncating the strings for some reason.
>>> a
array([' State-gov', ' Self-emp-not-inc', ' Private', ..., ' Private',
' Private', ' Self-emp-inc'],
dtype='|S27')
>>> stats.mode(a)
(array([' P'],
dtype='|S2'), array([ 22696.]))
(The answer should be ' Private'
.) Any ideas how I can get the full string? And why is this happening?
Solution 2
OTHER TIPS
Not sure you can solve with sp.stats.mode()
- I have also encountered this weird behavior before.
For a non-scipy solution you can use collections.Counter
:
collections.Counter(a).most_common(1)
This will return a tuple with the string and its number of occurrences.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow