replace values based on Number of duplicate rows are occured

https://datascience.stackexchange.com/questions/86722

17-12-2020
|

Question

I have a dataframe ,that looks like this

       site  Active
0     deals  Active
1     deals  Active
2     deals  Active
3  discount  Active
4  discount  Active

i don't want to drop the duplicate items, but i want to change the Active columns value based on Site column,for example Active has to change inactive based on duplicate item in site column,last duplicate item has to Active, other than that Inactive

Expected

       site    Active
0     deals  InActive
1     deals  InActive
2     deals    Active
3  discount  InActive
4  discount    Active

Solution

I would do this manually. First, let us create the index set of entries whose state must remain active. To do this, I iterate over all rows and record active instances. Note that the later occurrence overrides earlier ones, so we keep only the last one occurrence of active event.

last_active = dict()
for i, row in df.iterrows():
    if row['Active'] == 'Active':
        last_active[row['site']] = i
keep_active = last_active.values()

Now I assign the state 'Active' to those entries whose index is in keep_active and InActive otherwise.

df['refined_active'] = df.apply(lambda x: 'Active' if x.name in keep_active else 'InActive', axis=1)

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange