I'm currently researching my own question, and I'll try to answer myself here.
The Java
Pattern
documentation specifies that.
matches any character. Therefore, the regex which accepts any string would be:def anyString = ".*".r
To accept any non-empty string, we can use
".+".r
.To understand this, consider the following toy example:
object MyParser1 { override def skipWhitespace = false def expr = "<" ~ anyString ~ ">" def anyString = ".*".r }
Here, the string
<>
is rejected. To test this, use:println( MyParser1.parseAll(MyParser1.expr, "<>") )
This indicates that the
.*
parser is consuming until the end of the string, whereby the>
is not available for the final parser. Therefore, it seems to be necessary to forbid<
and>
form appearing inanyString
.As in the previous point, the
.*
parser consumes the whole string, and therefore consumes all>
symbols.In the same documentation, a negation operator is given. To exclude
<
and>
, we can write:def almostAnyString = "[^<>]*".r
In general, the construct
[^abc]
will match any character excepta
,b
, andc
.
To conclude, the best implementation I've found so far is the following:
object MyParser extends JavaTokenParsers {
override def skipWhitespace = false // don't allow whitespace between parsers by default
def expr: Parser[Any] = "<" ~ almostAnyString ~ ">" ~
whiteSpace ~ // this parser is defined in JavaTokenParsers
"<" ~ almostAnyString ~ ">"
def almostAnyString = "[^<>]*".r
}