operating on embedded tuples/strings, python
Pergunta
say I have a tagged text (word, tag) in tuple format. i want to convert it to a string in order to make some changes to the tags. my function below only sees the last sentence in the text, i guess there is some obvious and stupid mistake which i cant realize, so plz help to make it work on the entire text.
>>> import nltk
>>> tpl = [[('This', 'V'), ('is', 'V'), ('one', 'NUM'), ('sentence', 'NN'), ('.', '.')], [('And', 'CNJ'), ('This', 'V'), ('is', 'V'), ('another', 'DET'), ('one', 'NUM')]]
def translate(tuple2string):
for sent in tpl:
t = ' '.join([nltk.tag.tuple2str(item) for item in sent])
>>> print t
'And/CNJ This/V is/V another/DET one/NUM'
P.S. for those who are interested, tuple2str function is described here
EDIT: now i should convert it back to a tuple, having the same format. How do i do it?
>>> [nltk.tag.str2tuple(item) for item in t.split()]
the one above converts in into entire tuple, but i need embedded one (the same as in the input (tpl
) )
EDIT2: well, probably it's worth to publish the entire code:
def translate(tpl):
t0 = [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl]
for t in t0:
t = re.sub(r'/NUM', '/N', t)
t = [nltk.tag.str2tuple(item) for item in t.split()]
print t
Solução
>>> ' '.join(' '.join(nltk.tag.tuple2str(item) for item in sent) for sent in tpl)
'This/V is/V one/NUM sentence/NN ./. And/CNJ This/V is/V another/DET one/NUM'
EDIT:
If you want this to be reversible then just don't do the outer join.
>>> [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl]
['This/V is/V one/NUM sentence/NN ./.', 'And/CNJ This/V is/V another/DET one/NUM']
EDIT 2:
I thought we went over this already...
>>> [[nltk.tag.str2tuple(re.sub('/NUM', '/N', w)) for w in s.split()] for s in t0]
[[('This', 'V'), ('is', 'V'), ('one', 'N'), ('sentence', 'NN'), ('.', '.')],
[('And', 'CNJ'), ('This', 'V'), ('is', 'V'), ('another', 'DET'), ('one', 'N')]]
Splitting it out into the non-list-comprehension form:
def translate(tpl):
result = []
t0 = [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl]
for t in t0:
t = re.sub(r'/NUM', '/N', t)
t = [nltk.tag.str2tuple(item) for item in t.split()]
result.append(t)
return result