如何在Python中以非打印ASCII字符分开线

https://stackoverflow.com/questions/2936174

05-10-2019
|

题

我如何在非打印ASCII字符（例如长度减号HEX 0x97，八月227）中划分Python的线？我不需要角色本身。信息之后将作为变量保存。

解决方案

您可以使用 re.split.

>>> import re
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

调整图案以仅包括要保留的字符。

也可以看看：剥离非打印机的字符从派伦（Python）弦上

示例（w/ long sionus）：

>>> # \xe2\x80\x93 represents a long dash (or long minus)
>>> s = 'hello – world'
>>> s
'hello \xe2\x80\x93 world'
>>> import re
>>> re.split("\xe2\x80\x93", s)
['hello ', ' world']

或者，Unicode也是如此：

>>> # \u2013 represents a long dash, long minus or so called en-dash
>>> s = u'hello – world'
>>> s
u'hello \u2013 world'
>>> import re
>>> re.split(u"\u2013", s)
[u'hello ', u' world']

其他提示

_, _, your_result= your_input_string.partition('\x97')

或者

your_result= your_input_string.partition('\x97')[2]

如果 your_input_string 不包含 '\x97', ，然后 your_result 将是空的。如果 your_input_string 包含多 '\x97' 人物， your_result 第一个将包含所有内容 '\x97' 角色，包括其他 '\x97' 人物。

只需使用字符串/Unicode拆分方法（它们并不真正在乎您分配的字符串（除了是常数外。

要获得拆分字符串，要么像其他人一样逃脱它，要么显示“ x97”

或者

使用CHR（0x97）进行字符串（0-255）或UNICHR（0x97）进行Unicode

所以一个例子是

'will not be split'.split(chr(0x97))

'will be split here:\x97 and this is the second string'.split(chr(0x97))

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow