I wouldn't recommend using regular expressions here. It's definitely not as intuitive as just iterating over each line after being split on whitespace, possibly rearranging the list, and finally joining. You can try something like this,
reordered_corpus = open('reordered_corpus.txt', 'w')
with open('corpus.txt', 'r') as corpus:
for phrase in corpus:
phrase = phrase.split() # split on whitespace
vb_index = rp_index = -1 # variables for the indices
for i, word_pos in enumerate(phrase):
pos = word_pos.split('_')[1] # POS at index 1 splitting on _
if pos == 'VB' or pos == 'VBZ': # can add more verb POS tags
vb_index = i
elif vb_index >= 0 and pos == 'RP': # or more particle POS tags
rp_index = i
break # found both so can stop
if vb_index >= 0 and rp_index >= 0: # do any rearranging
phrase = phrase[:vb_index+1] + [phrase[rp_index]] + \
phrase[vb_index+1:rp_index] + phrase[rp_index+1:]
reordered_corpus.write(' '.join(word_pos for word_pos in phrase)+'\n')
reordered_corpus.close()
Using this code, if corpus.txt
reads,
you_PRP mean_VBP we_PRP should_MD kick_VB them_PRP out_RP ._.
don_VB 't_NNP take_VB it_PRP off_RP until_IN I_PRP say_VBP so_RB ._.
please_VB help_VB the_DT man_NN out_RP ._.
shut_VBZ it_PRP down_RP !_.
after running, reordered_corpus.txt
will be,
you_PRP mean_VBP we_PRP should_MD kick_VB out_RP them_PRP ._.
don_VB 't_NNP take_VB off_RP it_PRP until_IN I_PRP say_VBP so_RB ._.
please_VB help_VB out_RP the_DT man_NN ._.
shut_VBZ down_RP it_PRP !_.