Domanda

I am trying to match a text against a glossary list. the problem is that my pattern shows different behavour for one text. for example here is my text :

\nfor Sprints \nSprints \nSprinting \nAccount Accounts Accounting\nSprintsSprints 

with the following pattern matcher, I try to only find the exact word matches with glossary,and avoid finding the words ends with s,ing,... it only return me the right answer for "Account" word, but if I try Sprint, then it returns me Sprints, Sprinting, etc which is not right:

Pattern findTerm = Pattern.compile("(" + item.getTerm() + ")(\\W)",Pattern.DOTALL);

and here is my code :

    private static String findGlossaryTerms(String response, List<Glossary> glossary) {

        StringBuilder builder = new StringBuilder();
        for (int offset = 0; offset < response.length(); offset++) {
            boolean match = false;
            if (response.startsWith("<", offset)) {
                String newString = response.substring(offset);
                Pattern findHtmlTag = Pattern.compile("\\<.*?\\>");
                Matcher matcher = findHtmlTag.matcher(newString);
                if (matcher.find()) {
                    String htmlTag = matcher.group(0);
                    builder.append(htmlTag);
                    offset += htmlTag.length() - 1;
                    match = true;
                }
            }

            for (Glossary item : glossary) {
                if (response.startsWith(item.getTerm(), offset)) {
                    String textFromOffset = response.substring(offset - 1);
                    Pattern findTerm = Pattern.compile("(" + item.getTerm() + ")(\\W)",Pattern.DOTALL);
                    Matcher matcher = findTerm.matcher(textFromOffset);
                    if (matcher.find()) {
                        builder.append("<span class=\"term\">").append(item.getTerm()).append("</span>");
                        offset += item.getTerm().length() - 1;
                        match = true;
                        break;
                    }
                }
            if (!match)
                builder.append(response.charAt(offset));

        }
        return builder.toString();
    }
È stato utile?

Soluzione

What is the \\W in your pattern good for? if it just to ensure that the word ends, then use word boundaries instead:

Pattern findTerm = Pattern.compile("(\\b" + item.getTerm() + "\\b)",Pattern.DOTALL);

Those word boundaries ensure, that you are really matching the complete word and don't get partial matches.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top