Split a string and create a news columns with parents
Solution
I think you can achive the expected result by applying the following method. I've just create a fake dataframe to test.
import pandas as pd
df = pd.DataFrame(columns=["section"], data=[["level1.level2.level3.level4.level5"],["level1.level2.level3.level4.level5.level6"]])
columns = ['section', 'level', 'parent']
def split_parent(row):
levels = row["section"].split(".")
count_level = len(levels)
sub_levels = '.'.join(levels[:(count_level-1)])
# create a new row for each level of section
new_df = pd.DataFrame(columns=columns, data=[[row["section"], count_level, sub_levels]])
for num_level in range(count_level-1, 0, -1):
sec = ".".join(levels[:num_level])
par = "" if num_level == 1 else ".".join(levels[:num_level-1])
new_df = new_df.append({'section': sec, 'level': num_level, 'parent': par}, ignore_index=True)
return new_df
result = pd.DataFrame(columns=columns)
for index, row in df.iterrows():
result = result.append(split_parent(row), ignore_index=True)
print(result)
As result, you'll get:
section level \
0 level1.level2.level3.level4.level5 5
1 level1.level2.level3.level4 4
2 level1.level2.level3 3
3 level1.level2 2
4 level1 1
5 level1.level2.level3.level4.level5.level6 6
6 level1.level2.level3.level4.level5 5
7 level1.level2.level3.level4 4
8 level1.level2.level3 3
9 level1.level2 2
10 level1 1
parent
0 level1.level2.level3.level4
1 level1.level2.level3
2 level1.level2
3 level1
4
5 level1.level2.level3.level4.level5
6 level1.level2.level3.level4
7 level1.level2.level3
8 level1.level2
9 level1
10
If you need any further explanation, just write to me.
I hope I helped you.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange