From your confusion, it seems that what you need is to learn RegEx (here).
If you want the token to match everything, you can set the token_pattern
attribute in CountVectorizer
as:
.*
Meaning it will match every token coming from the tokenizer.
If you just want to match non-alphanumeric tokens, you can use:
[^A-Za-z0-9]*