Question

I am using this to split strings:

 let split = Str.split (Str.regexp_string " ") in
   let tokens = split instr in
 ....

But the problem is that for example here is a sentence I want to parse:

pop     esi

and after the split it turns to be (I use a helper function to print each item in the tokens list):

item: popitem: item: item: item: esi

See, there are three spaces in the token list.

I am wondering if there is a string.split like in Python which can parse instr this way:

item: popitem: esi

Is it possible?

Was it helpful?

Solution

Don't use Str.regexp_string, it's only for matching fixed strings.

Use Str.split (Str.regexp " +")

OTHER TIPS

Since OCaml 4.04.0 there is also String.split_on_char, which you can combine with List.filter to remove empty strings:

# "pop     esi"
  |> String.split_on_char ' '
  |> List.filter (fun s -> s <> "");;
- : string list = ["pop"; "esi"]

No external libraries required.

Using Jane Street's Core library, you can do:

let python_split x =
  String.split_on_chars ~on:[ ' ' ; '\t' ; '\n' ; '\r' ] x
  |> List.filter ~f:(fun x -> x <> "")
;;

This is how I split my lines into words:

open Core.Std
let tokenize line = String.split line ~on: ' ' |> List.dedup

Mind the single quotes around the space character.

Here's the documentation for String.split: link

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top