سؤال

The input can be either 1. or 2. or a combination of both.

  1. Sequential
    ...
    startLoop
      setSomething
    endLoop

    startLoop
      setSomething
    endLoop
    ...

The regex I use for this is (startLoop.+?endLoop)+? to get each loop block as my matcher group. This works fine for the sequential case where I access setSomething each time and alter it.

  1. Nested
    ...
    startLoop
      setSomething1.1
      startLoop
        setSomething2.1
        startLoop
          setSomething3
        endLoop
        setSomething2.2
      endLoop
      setSomething1.2
    endLoop
    ...

I wrote something like (startLoop.+?startLoop)+? but that only lets me access setSomething1.1

I'm not able to come up with a regex that lets me access setSomething no matter what type of loop structure the input has.

Appreciate your help.

هل كانت مفيدة؟

المحلول

I don't think it is possible to capture what you're describing with the help of regular expressions. Regular expressions can only capture regular languages whereas what you described for the nested loop situation is quite similar to a context-free language. According to the Chomsky hierarchy, regular languages form a strict subset of context-free languages and, therefore, cannot capture all context-free languages.

CFGs vs Regular Expressions

Context-free grammars are strictly more powerful than regular expressions.

  • Any language that can be generated using regular expressions can be generated by a context-free grammar.
  • There are languages that can be generated by a context-free grammar that cannot be generated by any regular expression.

  • Reference: http://www.cs.rochester.edu/~nelson/courses/csc_173/grammars/cfg.html

    نصائح أخرى

    Tried this, worked. It's a ridiculous way of doing it but works for now.

    private static String normalize(String input) {
        //Final string is held here
        StringBuilder markerString = new StringBuilder(input);
        //Look for the occurrences of startLoop-endLoop structures across lines
        Pattern p1 = Pattern.compile("(startLoop.+?\\endLoop)+?",Pattern.DOTALL);
        Matcher m1 = p1.matcher(markerString.toString());
        while(m1.find()){
            /* startLoop-endLoop structure found
             * Make sure length of StringBuilder remains same
             */
            markerString.setLength(input.length());
            //group will now contain the matched subsequence of the full string
            StringBuilder group = new StringBuilder(m1.group());
            /* Look for occurrences of startLoop within the matched group
             * and maintain a counter for the no of occurrences 
             */
            Pattern p2 = Pattern.compile("(startLoop)+?",Pattern.DOTALL);
            Matcher m2 = p2.matcher(group.toString());
            int loopCounter = 0;
            while(m2.find()){
                loopCounter++;
            }
            /* this takes care of the sequential loops scenario as well as matched group
             * in nested loop scenario
             */
            markerString.replace(m1.start(), m1.end(), m1.group().
                             replaceAll("setSomething", "setThisthing"));
            /* For the no of times that startLoop occurred in the matched group,
             * do the following
             * 1. Find the next index of endLoop after the matched group's end in the full string
             * 2. Read the subsequence between matched group's end and endIndex
             * 3. Replace all setSomething with setThisthing in the subsequence
             * 4. Replace subsequence in markerString
             * 5. Decrement forCounter
             */
            int previousEndIndex = m1.end();
            int currentEndIndex = -1;
            while(loopCounter>1){
                currentEndIndex = markerString.indexOf("endLoop",previousEndIndex);
                String replacerString  = markerString.substring(previousEndIndex,currentEndIndex);
                replacerString =  replacerString.replaceAll("setSomething", "setThisThing");
                markerString.replace(previousEndIndex, currentEndIndex, replacerString);
                previousEndIndex = currentEndIndex+7;
                loopCounter--;
            }
        }
        input = markerString.toString();
    }
    
    مرخصة بموجب: CC-BY-SA مع الإسناد
    لا تنتمي إلى StackOverflow
    scroll top