Question

Consider a set of strings like the following:

Memory size = 4,194,304 KB
Cache size=   32,768 K
Number of cores = 8
Note   =4,000,000 KB is less than 4 GB

Is there a generic and not too complex Java regular expression that matches each string entirely and produces the following groups?

"Memory size", "4,194,304", "KB"
"Cache size", "32,768", "K"
"Number of cores", "8"
"Note", "4,000,000 KB is less than 4 GB"

These groups are key, value and (optional) suffix.

Additional requirements:

  • The value (i.e., the part after '=') is not necessarily a number
  • Any spaces on either side of ':' should be removed in one pass, without backtracking
  • The "KB" and "K" string matching is not case sensitive
  • The captured groups should always have the same index (ideally, 3 groups for key/value/suffix, with the same group index for all matches)

Clearly, a simple expression like

  • ([^=]+) *: *([^=]+)

does not fully cover the specification above.

Was it helpful?

Solution

A regex that fully covers the spec above is:

^([^=]+?) *= *(.+?) *$

EDIT

Turns out I missed the K/KB part. Here is the amended version:

^([^=]+?) *= *(.+?) *(KB?)? *$

OTHER TIPS

Something like this should work:

^(.*?)\s*=\s*(?:([\d,]+)\s*(K|KB)$|(.*))

^ - match beginning of line

(.*?) capture the left term by matching anything (? makes it non-greedy; otherwise it would eat up all the whitespace).

\s*=\s* match and discard the equals sign and any space around it.

(?:([\d,]+)\s*(K|KB)$|(.*)) This long group matches either one thing or the other. (?: makes it a non-capturing group, because you don't want to capture the entire thing.

([\d,]+)\s*(K|KB)$ if there is a number followed by just K or KB and the end of the string, match that in two groups.

(.*) otherwise, match everything that remains in one group.

try it and tell me if it worked:

(.*) *= *(.*) (.*)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top