With regex, I would split it in 3 parts:
1.) (10) Patent Number
the regex could look like this:
\(10\)\s*Patent Number:\s*([\w,]+)
as a java string:
"\\(10\\)\\s*Patent Number:\\s*([\\w,]+)"
The matches for the first parenthesized group will be in [1]
.
\s
is a shorthand for[ \t\r\n\f]
any kind of white-space.\w
is a shorthand for[A-Za-z0-9_]
word-characters, together with,
in a character class.- Some characters have special meanings in regex. They have to be escaped with a backslash.
2.) (54) ENCRYPT...
A pattern could look like:
(?s)\(54\)\s*(.*?)\s*(?=\(\d|$\))
as a java string:
"(?s)\\(54\\)\\s*(.*?)\\s*(?=\\(\\d|$\\))"
(?s)
The s modifier equals Pattern.DOTALL where the dot matches new-lines too.(?=\(\d|$\))
a lookahead is used, to match(.*?)
lazy any amount of any characters until another(
followed by a digit|
or string-end$
(anchor for end) is seen.
3.) For the other desired 3 parts I would try to reflect formatting of the input with the pattern. This requires, that all data is constructed compatible. A pattern could look like this:
(?s)\(64\).*?Filed:\s*([\d,]+)\s*(\w+\.\s*\d+,\s*\d+)\s*\n[\d+][^\n]+\n\s*(\w+\.\s*\d+,\s*\d+)
as a java string:
"(?s)\\(64\\).*?Filed:\\s*([\\d,]+)\\s*(\\w+\\.\\s*\\d+,\\s*\\d+)\\s*\\n[\\d+][^\\n]+\\n\\s*(\\w+\\.\\s*\\d+,\\s*\\d+)"
- \n matches a newline.
Matches will be in [1]
e.g. 6,088,800, [2]
e.g. Jul. 11, 2000 and [3]
e.g. Feb. 27, 1998.
For getting started with regex, this is too much information at once :)