Question

I'm trying to use scipy's stats.mode function to get the most common string out of an array of strings. But the function is truncating the strings for some reason.

>>> a
array([' State-gov', ' Self-emp-not-inc', ' Private', ..., ' Private',
       ' Private', ' Self-emp-inc'],
      dtype='|S27')

>>> stats.mode(a)
(array([' P'],
      dtype='|S2'), array([ 22696.]))

(The answer should be ' Private'.) Any ideas how I can get the full string? And why is this happening?

Was it helpful?

Solution 2

It is/was happening because there was a bug in the implementation, where the results array was not being initialised with the same dtype as the input array. I submitted a pull request to fix it which has been accepted so I suppose it will be fixed in scipy 1.9.

OTHER TIPS

Not sure you can solve with sp.stats.mode() - I have also encountered this weird behavior before.

For a non-scipy solution you can use collections.Counter:

collections.Counter(a).most_common(1)

This will return a tuple with the string and its number of occurrences.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top