Question

The aim of my task is to add spaces before and after punctuation. Currently i've been using an iterative str.replace() to replace each punctuation p with " "+p+" ". How do i achieve the same output with str.translate() where i can just pass in two list or a dictionary:

inlist = string.punctuation
outlist = [" "+p+" " for p in string.punctuation]
inoutdict = {p:" "+p+" " for p in string.punctuation}

Lets assume that all the punctuations i have are in string.punctuation. Currently, i'm doing it as such:

from string import punctuation as punct
def punct_tokenize(text):
  for ch in text:
    if ch in deupunct:
      text = text.replace(ch, " "+ch+" ")
  return " ".join(text.split())

sent = "This's a foo-bar sentences with many, many punctuation."
print punct_tokenize(sent)

Also this iterative str.replace() is taking too long, will str.translate() be any faster?

Était-ce utile?

La solution

The dict form of translate only works with unicodes:

>>> import string
>>> inoutdict = {ord(p):unicode(" "+p+" ") for p in string.punctuation}
>>> unicode("foo,,,bar!!1").translate(inoutdict)
u'foo ,  ,  , bar !  ! 1'

Another option is with regular expressions:

>>> import re
>>> rx = '[%s]' % re.escape(string.punctuation)
>>> re.sub(rx, r" \g<0> ", "foo,,,bar!!1")
'foo ,  ,  , bar !  ! 1'

As usual, show us a bigger picture to get better answers, e.g. why are you doing that? where does the input come from?, etc...

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top