Question

I'm working on writing a simple Prolog interpreter in Java.

How can I find the last character index of the first element either the head element or the tail element of a string in "List Syntax"?

List Syntax looks like:

(X)
(p a b)
(func (func2 a) (func3 X Y))
(equal eve (mother cain))

The head for each of those strings in order are:
Head: "X", Index: 1
Head: "p", Index: 1
Head: "func", Index: 4
Head: "equal", Index: 5

Basically, I need to match the string that immediately follows the first "(" and ends either with a space or a closing ")", whichever comes first. I need the character index of the last character of the head element.

How can I match and get this index in Java?


Brabster's solution is really close. However, consider the case of:
((b X) Y)

Where the head element is (b x). I attempted to fix it by removing "(" from the scanner delimiters but it still hiccups because of the space between "b" and "x".

Similarly: ((((b W) X) Y) Z)

Where the head is (((b w) x) Y).

Was it helpful?

Solution

Java's Scanner class (introduced in Java 1.5) might be a good place to start.

Here's an example that I think does what you want (updated to include char counting capability)

public class Test {

    public static void main(String[] args) {

        String[] data = new String[] {
                "(X)",
                "(p a b)",
                "(func (func2 a) (func3 X Y))",
                "(equal eve (mother cain))",
                "((b X) Y)",
                "((((b W) X) Y) Z)"
        };


        for (String line:data) {
            int headIdx = 0;
            if (line.charAt(1) == '(') {
                headIdx = countBrackets(line);
            } else {
                String head = "";
                Scanner s = new Scanner(line);
                s.useDelimiter("[)|(| ]");
                head = s.next();
                headIdx = line.indexOf(head) + head.length() - 1;
            }
            System.out.println(headIdx);
        }

    }

    private static int countBrackets(String line) {
        int bracketCount = 0;
        int charCount = 0;
        for (int i = 1; i < line.length(); i++) {
            char c = line.charAt(i);
            if (c == '(') {
                bracketCount++;
            } else if (c == ')') {
                bracketCount--;
            }
            if (bracketCount == 0) {
                return charCount + 1;
            }
            charCount++;
        }
        throw new IllegalStateException("Brackets not nested properly");
    }
}

Output:

1
1
4
5
5
13

It's not a very elegant solution, but regexes can't count (i.e. brackets). I'd be thinking about using a parser generator if there's any more complexity in there :)

OTHER TIPS

Is there a reason you can't just brute force it? Something like this?

public int firstIndex( String exp ) {
    int parenCount = 0;
    for (int i = 1; i < exp.length(); i++) {
        if (exp.charAt(i) == '(') {
            parenCount++;
        }
        else if (exp.charAt(i) == ')') {
            parenCount--;
        }
        if (parenCount == 0 && (exp.charAt(i+1) == ' ' || exp.charAt(i) == ')')) {
            return i;
        }
    }
}

I may be missing something here, but I think that would work.

I suggest you write a proper parser (operator precedence in the case of Prolog) and represent the terms as trees of Java objects for further processing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top