i'm new to regex with Java, and i'm trying to extract all "lesson#" from this text:

<a id="lesson1" href="lesson1.html">Lesson 1</a>
<a id="lesson2" href="lesson2.html">Lesson 2</a>
<a id="lesson3" href="lesson3.html">Lesson 3</a>
<a id="lesson4" href="lesson4.html">Lesson 4</a>
<a id="lesson5" href="lesson5.html">Lesson 5</a>
<a id="lesson6" href="lesson6.html">Lesson 6</a>
<a id="lesson7" href="lesson7.html">Lesson 7</a>
<a id="lesson8" href="lesson8.html">Lesson 8</a>
<a id="lesson9" href="lesson9.html">Lesson 9</a>

I'm using this code to extract that part from my string:

String s = ""
        + "<a id=\"lesson1\" href=\"lesson1.html\">Lesson 1</a>\n"
        + "<a id=\"lesson2\" href=\"lesson2.html\">Lesson 2</a>\n"
        + "<a id=\"lesson3\" href=\"lesson3.html\">Lesson 3</a>\n"
        + "<a id=\"lesson4\" href=\"lesson4.html\">Lesson 4</a>\n"
        + "<a id=\"lesson5\" href=\"lesson5.html\">Lesson 5</a>\n"
        + "<a id=\"lesson6\" href=\"lesson6.html\">Lesson 6</a>\n"
        + "<a id=\"lesson7\" href=\"lesson7.html\">Lesson 7</a>\n"
        + "<a id=\"lesson8\" href=\"lesson8.html\">Lesson 8</a>\n"
        + "<a id=\"lesson9\" href=\"lesson9.html\">Lesson 9</a>\n"
        + "";

Pattern pattern = Pattern.compile("id=\"(lesson[0-9])");
Matcher m = pattern.matcher(s);

System.out.println("Find: " + m.find())
System.out.println("Matches: " + m.matches());

if (m.matches()) {
   System.out.println("Group 0: " + m.group(0));
}

The output i get with this code is:

Find: true
Matches: false

and in javadocs i read that if m.matches returns false i can't access groups.

Why if m.find() returns true, m.matches() returns false? I can't access groups with this code, so what am i missing?

有帮助吗?

解决方案

The matches method attempts to match the entire input sequence against the pattern. You should call Matcher.find() in a loop until it returns false. Each time you call Matcher.find(), you can access the group for the current found occurrence.

while (m.find()) {
   String someGroup = m.group(1);
}

其他提示

This is an answer to the first half of your question.

From the Javadoc

"The find method scans the input sequence looking for the next subsequence that matches the pattern."

"The matches method attempts to match the entire input sequence against the pattern. "

The difference is that the find method looks for a match of the regex anywhere in your string, while the matches method will only return true if the entire input matches. In particular, your regex starts with id = while your string starts with <a, so you won't get a match.

Try the following code :

    String data = "" + "<a id=\"lesson1\" href=\"lesson1.html\">Lesson 1</a>\n"
            + "<a id=\"lesson2\" href=\"lesson2.html\">Lesson 2</a>\n"
            + "<a id=\"lesson3\" href=\"lesson3.html\">Lesson 3</a>\n"
            + "<a id=\"lesson4\" href=\"lesson4.html\">Lesson 4</a>\n"
            + "<a id=\"lesson5\" href=\"lesson5.html\">Lesson 5</a>\n"
            + "<a id=\"lesson6\" href=\"lesson6.html\">Lesson 6</a>\n"
            + "<a id=\"lesson7\" href=\"lesson7.html\">Lesson 7</a>\n"
            + "<a id=\"lesson8\" href=\"lesson8.html\">Lesson 8</a>\n"
            + "<a id=\"lesson9\" href=\"lesson9.html\">Lesson 9</a>\n" + "";

    Pattern pattern = Pattern.compile("\\>([Ll]esson\\s+\\d+)");
    Matcher matcher = pattern.matcher(data);

    while (matcher.find()) {
        System.out.println(matcher.group(1));
    }

Hope this helps.

You just need to do:

if (m.find()) {
   System.out.println(m.group(1));
}
  • group(1) instead of group(0) because group(0) returns the whole match, whereas group(1) returns the group for the first parenthesis.
  • You do either m.find() or m.matches(). The difference is that m.matches() needs to match the entire string (see Difference between matches() and find() in Java Regex). Your regex only matches a substring inside the string, so the matches() fails and the find() finds.
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top