replace values based on Number of duplicate rows are occured
-
17-12-2020 - |
Pergunta
I have a dataframe ,that looks like this
site Active
0 deals Active
1 deals Active
2 deals Active
3 discount Active
4 discount Active
i don't want to drop the duplicate items, but i want to change the Active columns value based on Site column,for example Active has to change inactive based on duplicate item in site column,last duplicate item has to Active, other than that Inactive
Expected
site Active
0 deals InActive
1 deals InActive
2 deals Active
3 discount InActive
4 discount Active
Solução
I would do this manually. First, let us create the index set of entries whose state must remain active. To do this, I iterate over all rows and record active instances. Note that the later occurrence overrides earlier ones, so we keep only the last one occurrence of active event.
last_active = dict()
for i, row in df.iterrows():
if row['Active'] == 'Active':
last_active[row['site']] = i
keep_active = last_active.values()
Now I assign the state 'Active' to those entries whose index is in keep_active
and InActive
otherwise.
df['refined_active'] = df.apply(lambda x: 'Active' if x.name in keep_active else 'InActive', axis=1)
Licenciado em: CC-BY-SA com atribuição
Não afiliado a datascience.stackexchange