How can I find the index of the first “element” in my string using Java?
Question
I'm working on writing a simple Prolog interpreter in Java.
How can I find the last character index of the first element either the head element or the tail element of a string in "List Syntax"?
List Syntax looks like:
(X)
(p a b)
(func (func2 a) (func3 X Y))
(equal eve (mother cain))
The head for each of those strings in order are:
Head: "X", Index: 1
Head: "p", Index: 1
Head: "func", Index: 4
Head: "equal", Index: 5
Basically, I need to match the string that immediately follows the first "(" and ends either with a space or a closing ")", whichever comes first. I need the character index of the last character of the head element.
How can I match and get this index in Java?
Brabster's solution is really close. However, consider the case of:
((b X) Y)
Where the head element is (b x). I attempted to fix it by removing "(" from the scanner delimiters but it still hiccups because of the space between "b" and "x".
Similarly: ((((b W) X) Y) Z)
Where the head is (((b w) x) Y).
Solution
Java's Scanner class (introduced in Java 1.5) might be a good place to start.
Here's an example that I think does what you want (updated to include char counting capability)
public class Test {
public static void main(String[] args) {
String[] data = new String[] {
"(X)",
"(p a b)",
"(func (func2 a) (func3 X Y))",
"(equal eve (mother cain))",
"((b X) Y)",
"((((b W) X) Y) Z)"
};
for (String line:data) {
int headIdx = 0;
if (line.charAt(1) == '(') {
headIdx = countBrackets(line);
} else {
String head = "";
Scanner s = new Scanner(line);
s.useDelimiter("[)|(| ]");
head = s.next();
headIdx = line.indexOf(head) + head.length() - 1;
}
System.out.println(headIdx);
}
}
private static int countBrackets(String line) {
int bracketCount = 0;
int charCount = 0;
for (int i = 1; i < line.length(); i++) {
char c = line.charAt(i);
if (c == '(') {
bracketCount++;
} else if (c == ')') {
bracketCount--;
}
if (bracketCount == 0) {
return charCount + 1;
}
charCount++;
}
throw new IllegalStateException("Brackets not nested properly");
}
}
Output:
1
1
4
5
5
13
It's not a very elegant solution, but regexes can't count (i.e. brackets). I'd be thinking about using a parser generator if there's any more complexity in there :)
OTHER TIPS
Is there a reason you can't just brute force it? Something like this?
public int firstIndex( String exp ) {
int parenCount = 0;
for (int i = 1; i < exp.length(); i++) {
if (exp.charAt(i) == '(') {
parenCount++;
}
else if (exp.charAt(i) == ')') {
parenCount--;
}
if (parenCount == 0 && (exp.charAt(i+1) == ' ' || exp.charAt(i) == ')')) {
return i;
}
}
}
I may be missing something here, but I think that would work.
I suggest you write a proper parser (operator precedence in the case of Prolog) and represent the terms as trees of Java objects for further processing.