Domanda

i'm new to regex with Java, and i'm trying to extract all "lesson#" from this text:

<a id="lesson1" href="lesson1.html">Lesson 1</a>
<a id="lesson2" href="lesson2.html">Lesson 2</a>
<a id="lesson3" href="lesson3.html">Lesson 3</a>
<a id="lesson4" href="lesson4.html">Lesson 4</a>
<a id="lesson5" href="lesson5.html">Lesson 5</a>
<a id="lesson6" href="lesson6.html">Lesson 6</a>
<a id="lesson7" href="lesson7.html">Lesson 7</a>
<a id="lesson8" href="lesson8.html">Lesson 8</a>
<a id="lesson9" href="lesson9.html">Lesson 9</a>

I'm using this code to extract that part from my string:

String s = ""
        + "<a id=\"lesson1\" href=\"lesson1.html\">Lesson 1</a>\n"
        + "<a id=\"lesson2\" href=\"lesson2.html\">Lesson 2</a>\n"
        + "<a id=\"lesson3\" href=\"lesson3.html\">Lesson 3</a>\n"
        + "<a id=\"lesson4\" href=\"lesson4.html\">Lesson 4</a>\n"
        + "<a id=\"lesson5\" href=\"lesson5.html\">Lesson 5</a>\n"
        + "<a id=\"lesson6\" href=\"lesson6.html\">Lesson 6</a>\n"
        + "<a id=\"lesson7\" href=\"lesson7.html\">Lesson 7</a>\n"
        + "<a id=\"lesson8\" href=\"lesson8.html\">Lesson 8</a>\n"
        + "<a id=\"lesson9\" href=\"lesson9.html\">Lesson 9</a>\n"
        + "";

Pattern pattern = Pattern.compile("id=\"(lesson[0-9])");
Matcher m = pattern.matcher(s);

System.out.println("Find: " + m.find())
System.out.println("Matches: " + m.matches());

if (m.matches()) {
   System.out.println("Group 0: " + m.group(0));
}

The output i get with this code is:

Find: true
Matches: false

and in javadocs i read that if m.matches returns false i can't access groups.

Why if m.find() returns true, m.matches() returns false? I can't access groups with this code, so what am i missing?

È stato utile?

Soluzione

The matches method attempts to match the entire input sequence against the pattern. You should call Matcher.find() in a loop until it returns false. Each time you call Matcher.find(), you can access the group for the current found occurrence.

while (m.find()) {
   String someGroup = m.group(1);
}

Altri suggerimenti

This is an answer to the first half of your question.

From the Javadoc

"The find method scans the input sequence looking for the next subsequence that matches the pattern."

"The matches method attempts to match the entire input sequence against the pattern. "

The difference is that the find method looks for a match of the regex anywhere in your string, while the matches method will only return true if the entire input matches. In particular, your regex starts with id = while your string starts with <a, so you won't get a match.

Try the following code :

    String data = "" + "<a id=\"lesson1\" href=\"lesson1.html\">Lesson 1</a>\n"
            + "<a id=\"lesson2\" href=\"lesson2.html\">Lesson 2</a>\n"
            + "<a id=\"lesson3\" href=\"lesson3.html\">Lesson 3</a>\n"
            + "<a id=\"lesson4\" href=\"lesson4.html\">Lesson 4</a>\n"
            + "<a id=\"lesson5\" href=\"lesson5.html\">Lesson 5</a>\n"
            + "<a id=\"lesson6\" href=\"lesson6.html\">Lesson 6</a>\n"
            + "<a id=\"lesson7\" href=\"lesson7.html\">Lesson 7</a>\n"
            + "<a id=\"lesson8\" href=\"lesson8.html\">Lesson 8</a>\n"
            + "<a id=\"lesson9\" href=\"lesson9.html\">Lesson 9</a>\n" + "";

    Pattern pattern = Pattern.compile("\\>([Ll]esson\\s+\\d+)");
    Matcher matcher = pattern.matcher(data);

    while (matcher.find()) {
        System.out.println(matcher.group(1));
    }

Hope this helps.

You just need to do:

if (m.find()) {
   System.out.println(m.group(1));
}
  • group(1) instead of group(0) because group(0) returns the whole match, whereas group(1) returns the group for the first parenthesis.
  • You do either m.find() or m.matches(). The difference is that m.matches() needs to match the entire string (see Difference between matches() and find() in Java Regex). Your regex only matches a substring inside the string, so the matches() fails and the find() finds.
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top