Data manipulation via regex in python for removal/editing of certain data in parentheses.

https://stackoverflow.com/questions/19143729

30-06-2022
|

Вопрос

I am having a little bit of an issue with my data manipulation below... this is example code, normally each line in datas will always appear under the variable: "data"

import re

datas = """Class (EN)
    Class (NA)
    CLASS (AA)
    CLASS-TWO (AA)
    Class3-A-H (NO)"""

datas = datas.split("\n")

for data in datas:
    data = data.strip()
    data = re.sub(r'\s*\(\w+\)\s*$', '', data)
    print data

If you run the above code the school classes are returned without the class code (the bracketed part)

However, I have a few variations which require different handling...

Example: CLASS (NA) (N/A) should be returned: CLASS (N/A)

Example#2: CLASS (NA) (BB) should be returned: CLASS (B/B) (BB) is the only one what should never get removed but instead changed to (B/B)

For example the following data:

CLASS (EN)
CLASS (NA) (BB)
CLASS (AA) (N/A)
CLASS (N/A)
CLASS (BB)

Should return:

CLASS
CLASS (B/B)
CLASS (N/A)
CLASS (N/A)
CLASS (B/B)

I think this is fairly complicated and I've tried a fair few things but I honestly struggle with the regex parts

Thanks in advance - Hyflex

Решение

The easy way to do this is in two steps.

First, sub each (BB) to (B/B) (which you can even do with str.replace instead of re.sub if you want).

Then, since (B/B) no longer matches the pattern, your existing code already does the right thing.

So:

data = re.sub(r'\(BB\)', '(B/B)', data)
data = re.sub(r'\s*\(\w+\)\s*$', '', data)

Другие советы

how about this one?

import re

datas = """Class (EN)(EL)
    Class (NA)
    CLASS (AA)
    CLASS-TWO (AA)
    Class3-A-H (NO)"""

datas = datas.split("\n")

for data in datas:
    data = data.strip()
    data = re.sub(r'^([^ ]+?) +.*\((.)/?(.)\) *$', r'\1 (\2/\3)', data)
    print data

outcome same as question gives:

Class (E/L)
Class (N/A)
CLASS (A/A)
CLASS-TWO (A/A)
Class3-A-H (N/O)

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow