سؤال

is there any way to trim a series of string objects with out using for loop. I can do this element by element. I have a series a

print a
0    164
1     164
2     164
3     164
4     164
5     164

now I have to remove space at the start of each " 164"s. a.strip() results in AttributeError: 'Series' object has no attribute 'strip' Any help appreciated.

هل كانت مفيدة؟

المحلول 2

Well nothing wrong with your data or code, but do check the data thoroughly, even if one row doesn't have the right data, and you are trying to convert a series's particular columns type for a given range yet the entire series is being considered and thus your problem..

Reduce the test set and check for a couple of rows, it should just work fine.

نصائح أخرى

Use str.strip to remove the spaces:

df = pd.DataFrame({'a': ['164', ' 164', '    164']})
for item in df.a:
    print (len(item))
3
4
7
In [11]:

df.a = df.a.str.strip(' ')
for item in df.a:
    print (len(item))
3
3
3

To convert to ints do this:

In [20]:

df.a = df.a.astype(int)
df.dtypes

Out[20]:
a    int32
dtype: object

I've never used pandas, but if I understand correctly you might be wanting to do something like this.

from pandas import DataFrame
df = DataFrame({'a': ['164', ' 165']})
for index, row in df.iterrows():  
    print int(row['a'])

Sorry if I'm off-topic :-)

If all you need is to convert it to an int, how about just df[0].astype(int)?

In [16]: df = pd.DataFrame([' 164', '164', '164 ', '  164  '])

In [17]: df
Out[17]: 
         0
0      164
1      164
2     164 
3    164  

[4 rows x 1 columns]

In [18]: df.dtypes
Out[18]: 
0    object
dtype: object

In [19]: df[0] = df[0].astype(int)

In [20]: df.dtypes
Out[20]: 
0    int64
dtype: object

In [21]: df
Out[21]: 
     0
0  164
1  164
2  164
3  164

[4 rows x 1 columns]

You shall use a regular expression :

import re

trim_function = lambda x : re.findall("^\s*(.*?)\s*$",str(x))[0]

To explain a bit :

  • The character ^ represents the beginning of the string, and $ is the end of your string ; so that your expression will find exactly 1 match.

  • \s represents any whitespace character. So \s* is any sequence (even empty) of whitespaces.

  • .*? is any sequence of any character. I could not explain precisely why, but the ? sign let this experrsion be less greedy than \s* so that the whitespaces will be counted outside the parenthesis.

  • Finally, the parethesis (...) means that you are interseted in the substring(s) inside of them : the expression trimmed.

As re.findall provides a list of matching substrings, we have to select the first element.

Now, for a DataFrame :

df = pd.DataFrame([' 164', '164', '164 ', '  164  '])
df.applymap(trim_function)

For a Series

df = pd.Series([' 164', '164', '164 ', '  164  '])
df.apply(trim_function)

For an Index

df = pd.Index([' 164', '164', '164 ', '  164  '])
df.map(trim_function)

edit : Forgot : if you don't want to remove spaces at the end of each string, simply use the pattern "^\s*(.*?)".

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top