Pythonの非印刷ascii文字でラインを分割する方法

https://stackoverflow.com/questions/2936174

05-10-2019
|

質問

Pythonのラインを、非印刷ASCII文字（長いマイナスサインHex 0x97、Octal 227など）で分割するにはどうすればよいですか？キャラクター自体は必要ありません。情報は変数として保存されます。

解決

使用できます re.split.

>>> import re
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

パターンを調整して、保持する文字のみを含めます。

参照：ストリッピングノンプリント可能な特徴から - ストリングインパイソンから

例（長いマイナス付き）：

>>> # \xe2\x80\x93 represents a long dash (or long minus)
>>> s = 'hello – world'
>>> s
'hello \xe2\x80\x93 world'
>>> import re
>>> re.split("\xe2\x80\x93", s)
['hello ', ' world']

または、Unicodeでも同じです。

>>> # \u2013 represents a long dash, long minus or so called en-dash
>>> s = u'hello – world'
>>> s
u'hello \u2013 world'
>>> import re
>>> re.split(u"\u2013", s)
[u'hello ', u' world']

他のヒント

_, _, your_result= your_input_string.partition('\x97')

また

your_result= your_input_string.partition('\x97')[2]

もしも your_input_string aが含まれていません '\x97', 、それから your_result 空になります。もしも your_input_string 含む多数 '\x97' キャラクター、 your_result 最初のものの後にすべてが含まれます '\x97' 他の人を含むキャラクター '\x97' 文字。

文字列/Unicodeスプリットメソッドを使用するだけです（分割する文字列を実際には気にしません（定数以外の文字列。正規表現を使用する場合は、re.splitを使用します）

分割文字列を取得するには、他の人が「 x97」を示したように逃げるかどうか

また

文字列にCHR（0x97）を使用してください（0-255）またはUnicodeにはUnichr（0x97）

したがって、例はそうです

'will not be split'.split(chr(0x97))

'will be split here:\x97 and this is the second string'.split(chr(0x97))

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow