Struggling to understand the thought process required to come up with some recurrences for Dynamic Programming problems

https://cs.stackexchange.com/questions/119026

28-09-2020
|

Question

I was doing a few dynamic programming problems and I am struggling to understand the thought process required to come up with recurrences.

The first problem I solved was longest palindromic substring and I managed to solve it successfully using the following logic: I have a function f(i,j) which returns the longest palindromic substring between characters at positions i,j in the string. I came up with the following recurrence:

f(i,j) = true // if(s[i] = s[j])
f(i,j) = true // if(i>j) 
f(i,j) = true // if(s[i] = s[j] and f(i+1, j-1) == true)
f(i,j) = false // otherwise

Using this recurrence I managed to solve the problem successfully.

The second problem I wanted to solve was Longest valid parentheses (which I want to do using DP not stacks). The question is: Given a string containing just the characters '(' and ')', find the length of the longest valid parentheses substring.

My initial thought process was that this was similar to the longest palindromic substring problem that I already solved. So I defined f(i,j) as a function that returns longest valid parenthesis between elements i,j. The recurrence I came up with is as follows - here I use -1 to indicate that f(i,j) does not form a valid parenthesis:

f(i,j) = -1 // if i == j
f(i,j) = -1 // if j=i+1 and ! (s[i] = '(' and s[j] = ')')
f(i,j) = 2 // if j = i+1 and s[i] = '(' and s[j] = ')'
f(i,j) = 2 + f(i+1, j-1) if s[i] = '(' and s[j] = ')' and f(i+1, j-1) > 0;
f(i,j) = -1 // otherwise

My code (written in java) for the top-down version of the above is (note: dp[][] was initialized to -2 indicating state i,j is not processed yet):

private int topDown(String s, int i, int j, int[][] dp) {
    if(dp[i][j] != -2)
        return dp[i][j];

    if(i == j)
        dp[i][j] = -1;
    else if(j == i+1 && (s.charAt(i) != '(' || s.charAt(j) != ')' ) )
        dp[i][j] = -1;
    else if(j == i+1 && (s.charAt(i) == '(' && s.charAt(j) == ')' ) )
        dp[i][j] = 2;
    else if(s.charAt(i) == '(' && s.charAt(j) == ')' && topDown(s, i+1, j-1, dp) > -1)
        dp[i][j] = 2 + topDown(s, i+1, j-1, dp);
    else
        dp[i][j] = -1;

    return dp[i][j];
}

This does not work for inputs like ()() as it will return 2 instead of 4.

I was unable to come up with a solution of my own so I looked it up online. I found a solution here.

In this case the author defines f(i) as the length of the longest longest valid, balanced parentheses substring which begins at index i. He then describes the recurrence as follows:

f(i) = 2 + f(i+2)   // if s[i] = "(" and s[i+1] = ")"
f(i) = f(i+1) + f(i + f(i+1) + 2)   // s[i]="(" AND s[i+1]="(" AND s[i+f(i+1)+1]=")"
f(i) = 0    // otherwise

I have not written/tested the code for the recurrence but I assume it is correct.

My questions are as follows:

When I was solving the problem initially I had not thought of writing it as a function of f(i). I only thought of it as a function of f(i,j) and that turned out to be wrong. So is there an easy way to tell just by looking at a problem what states should be computed? For example why can't I solve the longest palindromic substring problem as a function of f(i)? similarly why can't I solve the longest parenthesis problem as a function of f(i,j)?
Is there a way to solve the longest valid parenthesis problem as a function of f(i,j) where f(i,j) represents the length of longest valid parenthesis between i and j? It is ok if the runtime complexity is higher I am just curious about this.

Thanks.

Solution

Here is a way to solve the problem of longest valid parentheses by dynamic programming as you had hoped. Or almost.

For simplicity, a string is called valid if it consists of well-formed opening and closing parentheses.

The critical observation is that a string is valid if it is the empty string, a concatenation of two valid strings or "(" followed by a valid string followed by ")". In fact, that observation is the usual recursive definition of well-formed opening and closing parentheses! That observation indicates that a suitable set of subproblems are s[i...j].

Let s be the given string of length L.

Let dp be an L by L+1 array of booleans.
dp[i][j] will become true iff the substring s[i,j] is valid. Here s[i,j] means the substring with indices from i inclusive to j exclusive, 0 <=i <= j < L. In particular, s[i,i] means the empty string.
Initially, all dp[i][j] are false.
The base case is the empty string, dp[i][i] = true for all i.
The recurrence relation is, according to the observation above, dp[i][j] = true for i < j if s[i] == '(' and dp[i+1][j-1] == true and s[j-1] == ')' or there is an index k, such that both dp[i][k] == true and dp[k][j] == true. We can stipulate that i < k < j.

Here is the top-down version of the above approach, following the style in the question.

private static boolean topDown(String s, int i, int j, Boolean[][] dp) {
    if (i > j) return false;
    if (dp[i][j] != null) return dp[i][j];

    if (i == j) {
        dp[i][i] = true;
    } else {
        if (s.charAt(i) == '(' && s.charAt(j - 1) == ')' && topDown(s, i + 1, j - 1, dp)) {
            dp[i][j] = true;
        } else {
            for (int k = i + 1; k < j; k++) {
                if (topDown(s, i, k, dp) && topDown(s, k, j, dp)) {
                    dp[i][j] = true;
                }
            }
        }

    }

    if (dp[i][j] != null) {
        return true;
    } else {
        dp[i][j] = false;
        return false;
    }
}

Here is the bottom-up version. The code is sped-up a bit by the observation that a string can be valid only when its length is even.

private static int longestValidSubstring(String s) {
    final int L = s.length();

    // isValid[i][j] is true iff s.substring(i,j) is valid.
    boolean[][] isValid = new boolean[L][L + 1];

    // empty string is valid.
    for (int i = 0; i < L; i++) {
        isValid[i][i] = true;
    }
    for (int l = 2; l <= L; l += 2) {  // l is the length
        for (int i = 0; i <= L - l; i++) {
            // determine isValid[i][i+l]
            if (s.charAt(i) == '(' && s.charAt(i + l - 1) == ')' && isValid[i + 1][i + l - 1]) {
                isValid[i][i + l] = true;
            } else {
                for (int j = 2; j < l; j += 2) {
                    if (isValid[i][i + j] && isValid[i + j][i + l]) {
                        isValid[i][i + l] = true;
                        break;
                    }
                }
            }
        }
    }

    for (int l = L & 0xfffffffe; l >= 2; l -= 2) {
        for (int i = 0; i <= L - l; i++) {
            if (isValid[i][i + l]) {
                return l;
            }
        }
    }
    return 0;
}

The iterative version should be faster than the top-down version. However, even the iterative version is not fast enough when L is large since it computes L * L entries, which is pretty large once L is greater than 10000. Moreover, the two dimensional array isValid uses lot of memory as well.

Here and here are two further optimized solutions. I will not discussion them, since they are out of the scope of this answer.

Is there a way to solve the longest valid parentheses problem as a function of f(i,j) where f(i,j) represents the length of longest valid parentheses between i and j? It is ok if the runtime complexity is higher I am just curious about this.

The fundamental quality of the subproblems in dynamic programming is that you can solve a larger subproblem more easily once you have the solution to the smaller subproblem. Or, you can extend or combine the solutions to the smaller subproblems to build a solution to the larger problem.

In this particular problem, if we know whether all shorter substrings are valid or not, then it is easy to know a longer substring that is slightly longer is valid or not. On the other hand, if we know all f(i,j), which represent the length of longest valid substring between i and j, the lack of specific location of those longest valid substrings does not help us determine whether a substring of a longer substring is valid or not. The difference is critical, although might be subtle to untrained eyes. You will learn to appreciate the difference.

When I was solving the problem initially I had not thought of writing it as a function of f(i). I only thought of it as a function of f(i,j) and that turned out to be wrong. So is there an easy way to tell just by looking at a problem what states should be computed? For example why can't I solve the longest palindromic substring problem as a function of f(i)? similarly why can't I solve the longest parenthesis problem as a function of f(i,j)?

As I have shown above, it is not necessarily wrong to use f(i,j) although we should give f(i,j) a slightly different meaning. Solving a problem efficiently might not be easy, even if it is known to be solvable by dynamic programming. Quite a few problems puzzle me for a long time even if I consider myself experienced. It requires some intelligence and experience to identify or formulate the proper set of subproblems (which are not necessarily in the exact form of the original problem).

To extend a palindromic substring to a longer palindromic substring, you need to extend it from both sides. That indicates it may not easy to solve the longest palindromic substring problem as a function of a single variable that stands for the one end of the palindromic string in a simple and efficient way. However, if we analyze the fine structure of palindromes, we could arrive at the Manacher's algorithm , which basically use a map from a single variable that stands for the center point of palindromic substrings.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange