Flatten (an irregular) list of lists in Python respecting Pandas Dataframes

https://stackoverflow.com/questions/21461140

05-10-2022
|

Question

This is a recursive question here on Stackoverflow, yet the solution given here is still not perfect. Yielding is still (for me) one of the most complex things to use in python, so I dont know how to fix it myself.

When an item within any of the lists given to the function is a Pandas dataframe, the flatten function will return its header, instead of the dataframe itself. You can expressly test this by running the following code:

import pandas
import collections
df = pandas.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

Then, if you call the function given on the referenced post:

list(flatten([df]))   #['A', 'B', 'C', 'D']

Instead of returning a list with the dataframe inside. How to make the function flatten respect the dataframes?

Solution

That flatten function will recurse down if the element is an instance of collections.Iterable and it's not a string (which is iterable, but we usually want to treat it as a scalar, something we're not going to look inside).

Even though DataFrames are instances of collections.Iterable, it sounds like you want them to be terminal too. In that case:

    if (isinstance(el, collections.Iterable) and 
        not isinstance(el, (basestring, pandas.DataFrame))):

After which:

>>> list(flatten([[1,2], "2", df]))
[1, 2, '2', <class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 4 columns):
A    100  non-null values
B    100  non-null values
C    100  non-null values
D    100  non-null values

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow