A regex that fully covers the spec above is:
^([^=]+?) *= *(.+?) *$
EDIT
Turns out I missed the K/KB part. Here is the amended version:
^([^=]+?) *= *(.+?) *(KB?)? *$
Question
Consider a set of strings like the following:
Memory size = 4,194,304 KB
Cache size= 32,768 K
Number of cores = 8
Note =4,000,000 KB is less than 4 GB
Is there a generic and not too complex Java regular expression that matches each string entirely and produces the following groups?
"Memory size", "4,194,304", "KB"
"Cache size", "32,768", "K"
"Number of cores", "8"
"Note", "4,000,000 KB is less than 4 GB"
These groups are key
, value
and (optional) suffix
.
Additional requirements:
Clearly, a simple expression like
([^=]+) *: *([^=]+)
does not fully cover the specification above.
Solution
A regex that fully covers the spec above is:
^([^=]+?) *= *(.+?) *$
EDIT
Turns out I missed the K/KB part. Here is the amended version:
^([^=]+?) *= *(.+?) *(KB?)? *$
OTHER TIPS
Something like this should work:
^(.*?)\s*=\s*(?:([\d,]+)\s*(K|KB)$|(.*))
^
- match beginning of line
(.*?)
capture the left term by matching anything (?
makes it non-greedy; otherwise it would eat up all the whitespace).
\s*=\s*
match and discard the equals sign and any space around it.
(?:([\d,]+)\s*(K|KB)$|(.*))
This long group matches either one thing or the other. (?:
makes it a non-capturing group, because you don't want to capture the entire thing.
([\d,]+)\s*(K|KB)$
if there is a number followed by just K or KB and the end of the string, match that in two groups.
(.*)
otherwise, match everything that remains in one group.
try it and tell me if it worked:
(.*) *= *(.*) (.*)