How to delete "1" followed by trailing zeros from Data Frame row values ?

StackOverflow https://stackoverflow.com/questions/23623216

  •  21-07-2023
  •  | 
  •  

سؤال

enter image description here

From my "Id" Column I want to remove the one and zero's from the left. That is 1000003 becomes 3 1000005 becomes 5 1000011 becomes 11 and so on

Ignore -1, 10 and 1000000, they will be handled as special cases. but from the remaining rows I want to remove the "1" followed by zeros.

هل كانت مفيدة؟

المحلول

Well you can use modulus to get the end of the numbers (they will be the remainder). So just exclude the rows with ids of [-1,10,1000000] and then compute the modulus of 1000000:

print df

        Id
0       -1
1       10
2  1000000
3  1000003
4  1000005
5  1000007
6  1000009
7  1000011

keep = df.Id.isin([-1,10,1000000])
df.Id[~keep] = df.Id[~keep] % 1000000
print df

        Id
0       -1
1       10
2  1000000
3        3
4        5
5        7
6        9
7       11

Edit: Here is a fully vectorized string slice version as an alternative (Like Alex' method but takes advantage of pandas' vectorized string methods):

keep = df.Id.isin([-1,10,1000000])
df.Id[~keep] = df.Id[~keep].astype(str).str[1:].astype(int)
print df

        Id
0       -1
1       10
2  1000000
3        3
4        5
5        7
6        9
7       11

نصائح أخرى

Here is another way you could try to do it:

def f(x):
    """convert the value to a string, then select only the characters
       after the first one in the string, which is 1. For example,
       100005 would be 00005 and I believe it's returning 00005.0 from 
       dataframe, which is why the float() is there. Then just convert 
       it to an int, and you'll have 5, etc.
    """
    return int(float(str(x)[1:]))

# apply the function "f" to the dataframe and pass in the column 'Id'
df.apply(lambda row: f(row['Id']), axis=1)

I get that this question is satisfactory answered. But for future visitors, what I like about alex' answer is that it does not depend on there to be exactly four zeros. The accepted answer will fail if you sometimes have 10005, sometimes 1000005 and whatever.

However, to add something more to the way we think about it. If you know it's always going to be 10000, you can do

# backup all values
foo = df.id 
#now, some will be negative or zero
df.id = df.id - 10000 
#back in those that are negative or zero (here, first three rows)
df.if[df.if <= 0] = foo[df.id <= 0] 

It gives you the same as Karl's answer, but I typically prefer these kind of methods for their readability.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top