Pandas: matching a string in series with string from another series

https://stackoverflow.com/questions/23544639

18-07-2023
|

Frage

I have a DataFrame that looks like this:

Full                          Partial
ABCDEFGHIJKLMNOPQRSTUVWXYZ    FGHIJKL
ANLHDFKNADHFBAKHFGBAKJFB      FKNADH
JABFKADFNADKHFBADHBFJDHFBADF  ABFKA

What I want to do is to put everything from Full that does NOT match Partial in lowercase, yielding the following:

Coverage
abcdef_GHIJKL_mnopqrstuvwxyz
anlhd_FKNADH_fbakhfgbakjfb
j_ABFKA_dfnadkhfbadhbfjdhfbadf

How would I do this? I looked around and it seems that series.str.extract() could be a solution, but I'm not certain as when I try to do this:

df['Full'].str.extract(data['Partial'])

... it only says that Series can't be hashable. I assume that extract only takes a single argument, rather than a Series? Is there any way to bypass this? Is extract even the correct way to achieve what I'm looking for, or is there another way? I'm thinking I could perhaps find som way to extract the string indexes and do the following pseudocode:

df['Coverage'] = data['Full'][:start].lower() + '_' + data['Partial'] + \
     '_' + data['Full'][End:].lower()

... where Start and End is the indexes for where data['Partial'] starts and ends, respectively. Thoughts?

Lösung

Not the most elegant perhaps, but here is one solution:

For df:

                           Full  Partial
0    ABCDEFGHIJKLMNOPQRSTUVWXYZ  FGHIJKL
1      ANLHDFKNADHFBAKHFGBAKJFB   FKNADH
2  JABFKADFNADKHFBADHBFJDHFBADF    ABFKA

This:

df.apply(lambda r: r.Full.lower().replace(r.Partial.lower(), '_' + r.Partial + '_'), axis=1)

Returns:

0      abcde_FGHIJKL_mnopqrstuvwxyz
1        anlhd_FKNADH_fbakhfgbakjfb
2    j_ABFKA_dfnadkhfbadhbfjdhfbadf

For each row, you convert the full string to lowercase, and replace the 'partial string to lower' with the original partial string with two underscores added on both sides.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow