Question

Why does Pandas tell me that I have objects, although every item in the selected column is a string — even after explicit conversion.

This is my DataFrame:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56992 entries, 0 to 56991
Data columns (total 7 columns):
id            56992  non-null values
attr1         56992  non-null values
attr2         56992  non-null values
attr3         56992  non-null values
attr4         56992  non-null values
attr5         56992  non-null values
attr6         56992  non-null values
dtypes: int64(2), object(5)

Five of them are dtype object. I explicitly convert those objects to strings:

for c in df.columns:
    if df[c].dtype == object:
        print "convert ", df[c].name, " to string"
        df[c] = df[c].astype(str)

Then, df["attr2"] still has dtype object, although type(df["attr2"].ix[0] reveals str, which is correct.

Pandas distinguishes between int64 and float64 and object. What is the logic behind it when there is no dtype str? Why is a str covered by object?

Was it helpful?

Solution

The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in a ndarray must has the same size in byte. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of save the bytes of strings in the ndarray directly, Pandas use object ndarray, which save pointers to objects, because of this the dtype of this kind ndarray is object.

Here is an example:

  • the int64 array contains 4 int64 value.
  • the object array contains 4 pointers to 3 string objects.

enter image description here

OTHER TIPS

The accepted answer is good. Just wanted to provide an answer which referenced the documentation. The documentation says:

Pandas uses the object dtype for storing strings.

As the leading comment says "Don't worry about it; it's supposed to be like this." (Although the accepted answer did a great job explaining the "why"; strings are variable-length)

But for strings, the length of the string is not fixed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top