Flatten a column with value of type list while duplicating the other column's value accordingly in Pandas

Question 1

I guess easies way to flatten list of lists would be a pure python code, as this object type is not well suited for pandas or numpy. So you can do it with for example

>>> b_flat = pd.DataFrame([[i, x] 
...               for i, y in input['B'].apply(list).iteritems() 
...                    for x in y], columns=list('IB'))
>>> b_flat = b_flat.set_index('I')

Having B column flattened, you can merge it back:

>>> input[['A']].merge(b_flat, left_index=True, right_index=True)
   A  B
0  1  a
0  1  b
1  2  c

[3 rows x 2 columns]

If you want the index to be recreated, as in your expected result, you can add .reset_index(drop=True) to last command.

Question 2

You can use df.explode(). Check out this method here

Question 3

It is surprising that there isn't a more "native" solution. Putting the answer from @alko into a function is easy enough:

def unnest(df, col, reset_index=False):
    import pandas as pd
    col_flat = pd.DataFrame([[i, x] 
                       for i, y in df[col].apply(list).iteritems() 
                           for x in y], columns=['I', col])
    col_flat = col_flat.set_index('I')
    df = df.drop(col, 1)
    df = df.merge(col_flat, left_index=True, right_index=True)
    if reset_index:
        df = df.reset_index(drop=True)
    return df

Then simply

input = pd.DataFrame({'A': [1, 2], 'B': [['a', 'b'], 'c']})
expected = unnest(input, 'B')

I guess it would be nice to allow unnesting of multiple columns at once and to handle the possibility of a nested column named I, which would break this code.

Question 4

A slightly simpler / more readable solution than the ones above which worked for me.

 out = []
 for n, row in df.iterrows():
    for item in row['B']:
        row['flat_B'] = item
        out += [row.copy()]


flattened_df = pd.DataFrame(out)

Question 5

How about

input = pd.DataFrame({'A': [1, 2], 'B': [['a', 'b'], 'c']})

input[['A', 'B']].set_index(['A'])['B'].apply(pd.Series).stack().reset_index(level=1, drop=True).reset_index().rename(columns={0:'B'})

Out[1]: 
   A  B
0  1  a
1  1  b
2  2  c

Question 6

One liner - applying the pd.DataFrame constructor, concatenating and joining to original.

my_df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [(1, 2), (1, 2), (2, 3)]})
my_df.join(pd.concat(map(lambda x: pd.DataFrame(list(x)), my_df['c']), axis=0))

Question 7

You can also manipulate the list first, then create a new dataframe: for example:

input = DataFrame({'A': [1, 2], 'B': [['a', 'b'], 'c']})
listA=input.A.tolist()
listB=input.B.tolist()
count_sublist_len=[len(ele) for ele in listB if type(ele)==list else 1]
# create similar list for A
new_listA=[count_sublist_len[i]*[listA[i]] for i in range(len(listA)]
# flatten them
f_A=[item for sublist in new_listA for item in sublist]
f_B=[item for sublist in listB for item in sublist]
df_new=pd.DataFrame({'A':f_A,'B':f_b})

Question 8

Basically the same as what yaiir did but then using list comprehension in a nice function:

def flatten_col(df: pd.DataFrame, col_from: str, col_to: str) -> pd.DataFrame:
    return pd.DataFrame([row.copy().set_value(col_to, x)
                         for i, row in df.iterrows()
                         for x in row[col_from]]) \
        .reset_index(drop=True)

where col_from is the column containing the lists and col_to is the name of the new column with the split list values.

Use as flatten_col(input, 'B', 'B') in your example. The benefit of this method is that copies along all other columns as well (unlike some other solutions). However it does use the deprecated set_value method..