Domanda

I'm trying to come up with an algorithm in Java that can detect whether given code contains Java keywords and capture them for proper formatting.

The catch is that I only want to detect keywords if they are not within a String literal.

For example in the statement

return "I love abstract" + this.artform

I want to capture return and this, but NOT abstract.

THUS FAR:

So far, I've created a succesful regular expression that can detect all keywords.

regexp = "(?<=\\W?)(" + keywords.toString() + ")(?=(\\s|\\(|\\.|\\{))"

However, it gets complicated now that I need to integrate it with the ability to know when matches are within literals.

È stato utile?

Soluzione

It will be difficult to integrate the secondary portion like you mentioned. Continue using the regular expression technique for the first part like you did. Using the java.util.regex.Pattern and java.util.regex.Matcher you can determine determine if there is a match by using the find() method (on each of the keywords). If true, you can call the start() method to determine the position of the keyword in the last call to the find method. (Use these methods in tandem).

The tricky part is in actuality very easy using the String class is to determine of all the instances of a '"' double quote character and obtaining their character positions. Next figure out if the start of the keyword position is greater than the position of the first double quote and less than the position of the next double quote character. First you need to ensure that one double quote the sibling of it's respective double quote pair. Of course you may want to ensure that the entire keyword falls in between both these positions. Furthermore you'll need to be smart about double quotes falling on separate lines or continuation lines if that scenario applies.

Basically don't try to apply the secondary portion using regex expressions is my suggestion unless you really want to go crazy trying to implement it.

Altri suggerimenti

I suspect that you'll want a full blown Java grammar and parser, e.g. search for JavaCC and associated Java grammars, but at the bare minimum, you'll want to use a tokenizer, and then define all the various valid token types for Java. Again, you can just use the Java grammars for JavaCC, which already have all the tokens defined for you. See the following links:

https://javacc.java.net/
https://java.net/projects/javacc/downloads?page=2&path%5B%5D=contrib&path%5B%5D=grammars&theme=java.net

I solved my problem by just checking if there was an odd number of quotes in the String so far, before my keyword appears. If it's odd, then a String is open, and my alleged keyword is inside a String. If it's even, all String literals have been opened and closed already.

private boolean oddNumberOfQuotes(String prev) {
    int quoteCount = 0;
    for(char ch : prev.toCharArray())
    {
      if(ch == '"') quoteCount++;
    }
    boolean odd = quoteCount % 2 != 0;
    return odd;
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top