Thursday, February 26, 2015

Distinct Subsequences

Given a string S and a string T, count the number of distinct subsequences of T in S.
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).
Here is an example:
S = "rabbbit"T = "rabbit"
Return 3.
Solution: DP
When you see string problem that is about subsequence or matching, dynamic programming method should come to your mind naturally. The key is to find the changing condition.
首先设置动态规划数组res[i][j],表示S串中从开始位置到第i位置与T串从开始位置到底j位置匹配的子序列的个数。
 如果S串为空,那么res[0][j]都是0;
 如果T串为空,那么res[i][j]都是1,因为空串为是任何字符串的字串。
这里我们维护res[i][j],对应的值是S的前i个字符和T的前j个字符有多少个可行的序列(注意这道题是序列,不是子串,也就是只要字符按照顺序出现即可,不需要连续出现)。
假设我们现在拥有之前的历史信息,我们怎么在常量操作时间内得到res[i][j]。假设S的第i个字符和T的第j个字符不相同,那么就意味着res[i][j]的值跟res[i-1][j]是一样的,前面该是多少还是多少,而第i个字符的加入也不会多出来任何可行结果。
如果S的第i个字符和T的第j个字符相同,那么所有res[i-1][j-1]中满足的结果都会成为新的满足的序列,当然res[i-1][j]的也仍是可行结果,所以res[i][j]=res[i-1][j-1]+res[i-1][j]
所以综合上面两种情况,递推式应该是res[i][j]=(S[i]==T[j]?res[i-1][j-1]:0)+res[i-1][j]
算法进行两层循环,时间复杂度是O(m*n),而空间上只需要维护当前i对应的数据就可以,也就是O(m)
public class Solution {
    public int numDistinct(String S, String T) {
        if(S.length() == 0)  //if s is empty, no distinct subsequence
            return 0;
        
        if(T.length() == 0)  //if T is empty, empty is a subsequence
            return 1;
        
        int[] res = new int[T.length() + 1];
        res[0] = 1;  //initial state
        
        for(int i = 0; i < S.length(); i++) {
            for(int j = T.length() - 1; j >= 0; j--) {
                if(S.charAt(i) == T.charAt(j)) {
                    res[j + 1] += res[j];
                }
            }
        }
        
        return res[T.length()];
    }
}

No comments:

Post a Comment