I suggest to use the shlex
module for breaking up quoted string:
>>> import shlex
>>> s = 'hello "quoted string" 123 \'More quoted string\' end'
>>> s
'hello "quoted string" 123 \'More quoted string\' end'
>>> shlex.split(s)
['hello', 'quoted string', '123', 'More quoted string', 'end']
After that, you can classify all your tokens (string, number, ...) as you want. The only thing you are missing is space: shlex does not care about space.
Here is a simple demo:
import shlex
if __name__ == '__main__':
line = 'abcd xvc 23432 "exampe" 366'
tokens = shlex.split(line)
for token in tokens:
print '>{}<'.format(token)
Output:
>abcd<
>xvc<
>23432<
>exampe<
>366<
Update
If you insist on not stripping the quote marks, then call split() with posix=False
:
tokens = shlex.split(line, posix=False)
Output:
>abcd<
>xvc<
>23432<
>"exampe"<
>366<