How to Tokenize A Text By Regular Expression In Python [closed]
-
21-12-2019 - |
Frage
Is there any way to clean a text from whitespaces and dots, commas without NLTK, but especially by regular expressions?
Lösung
If I have understood your question you can try this code
import re
text = "Split.this,text in seven.separate,words"
myexp=re.compile(r'[\s.,]')
print myexp.split(text)
that gives you this output
['Split', 'this', 'text', 'in', 'seven', 'separate', 'words']
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow