Question

I have a tab-delimited txt file.

1    (hi7 there)    my
2    (hi7)there    he3

I want to remove brackets only when that bracket embraces the whole entry(I am not sure if the word entry is the right word to describe it, anyhow)

So the output should be

1    hi7 there    my
2    (hi7)there    he3

I know I can easily find how to remove all brackets. But I couldn't find how to remove brackets only when they embrace the whole entry.

Can I do it simply by using notepad++ or python, whichever is faster?

Was it helpful?

Solution

This expr seems to handle all possibilities correctly:

(?m)     # multiline mode
(^|\t)   # start of line of field 
\(       # (
   ([^\t]+?) # anything but a tab
\)       # )
(?=      # followed by...
   $|\t  # end of line or field
)

replace with \1\2.

Example:

import re

rx = r'(?m)(^|\t)\(([^\t]+?)\)(?=$|\t)'

txt = """
1   (hi7 (the)re)   (my)
2   (hi7)there  he3
(22)    (hi7)there  he3
(22)    (hi7there)  (he3)
"""

print re.sub(rx, r'\1\2', txt)

Result:

1   hi7 (the)re my
2   (hi7)there  he3
22  (hi7)there  he3
22  hi7there    he3

OTHER TIPS

i think this should work

f = open("file.txt")
for line in f:
 l = line.strip().split("    ")
 for word in l:
  if word[0] == "(" and word[-1] == ")":
   print (word[1:len(word)-1]),
  else:
   print (word),
 print

for overwrite

import fileinput

for line in fileinput.FileInput("file.txt", inplace=1):
    l = line.strip().split("    ")
    s = ""
    for sent in l:
        if sent[0] == "(" and sent[-1] == ")":
            s += sent[1:len(sent) - 1] + "    "
        else:
            s += sent + "    "
    print s[:-1]

You can use the tab character \t in python regexp expression, so you can match like this :

>>> import re
>>> re.match('^\([^\t]+\)\t.*$', '(hi7 there)\tmy')
>>> <_sre.SRE_Match object at 0x02573950>
>>> re.match('^\([^\t]+\)\t.*$', '(hi7)there\tmy')
>>>

Once you know how to match your string, it is easy to remove bracket only if the line match.

If they are really tab delimited, you can replace

\t\(([^\t]*)\)\t

\t           # a tab
\(           # an opening parenthesis
(            # open the capturing group
    [^\t]*   # anything but a tab
)
\)
\t

with

\t\1\t

The idea is to capture the text inside the relevant brackets, and to use it in the replacement with the backreference \1.

See demo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top