Longest common Subsequence


If you want to practice data structure and algorithm programs, you can go through 100+ data structure and algorithm programs.

Given two Strings A and B. Find the length of the Longest Common Subsequence (LCS) of the given Strings.
Subsequence can contain any number of characters of a string including zero or all (subsequence containing zero characters is called as empty subsequence). Characters need not to be contiguous but must maintain the relative order as in the given String. We need to find the maximum length of this subsequence which has characters common to both the given Strings.



We just need to find the maximum length of a sequence containing characters common to both the strings in the relative order same as that of the given Strings.
In this approach we calculate our result in bottom-up manner that is, while returning from recursion.
We keep on shortening the length of the given Strings by one character, and once we hit the basecase, this is the point where at least one of the two string is exhausted having size zero. So in this case we can not have any matching character in the two strings, therefore, we return 0.
As we know we kept on removing the first character of the string in every recursion state, now the result of the bigger string will depend on the result of these smaller Strings.
Now to find this, two possibilities arise :

Case 1 :

When the two characters match.

If the both the character matches then we can add 1 to the length of the LCS found till now and can return one plus the result of shortened strings which we will get from recursion.
that is, 1 + LCS( shortened A, shortened B)

Case 2 :

When the two characters do not match.
If the characters do not match, then there are chances are, that the first character of String A matches with some other character in B or the first character of B matches with some other character in A. But as we need our result to be maximum therefore we return the maximum of these two calls.
that is, Maximum of (LCS of shortened A & B) and (LCA of A and shortened B).

As at every point we are making two calls when the first character of the two strings does not match, therefore, in the worst case where there is no matching characters in either of the two strings, then in that case our worst time complexity will be O(2^n), where n is the length of the shorter String.

Consider an Example for better understanding and its recursion tree,
A : son
B : nop

Longest common subsequence
As we can see, in the recursion tree, we make only one call for remaining Strings whenever the first character matches and in every other case we make two calls.
At every base case condition we are returning zero and we are returning one plus the result achieved from recursion whenever there is a match in the first two characters.

Approach – II : Dynamic Programming:

We can clearly see the duplication of the calls represented by red colour in the recursion tree where result of same problem is getting calculated again. These Overlapping Subproblems are leading to an increase in Time Complexity of this algorithm.
Also, The result of bigger problem can be calculated with the help of results of smaller subproblems as we discussed in the recursive approach how we calculated the maximum length of LCS of a bigger string with the help of results of the smaller (shortened) strings.
As this problem has both the properties of Dynamic Programming, that is, Overlapping subproblems and Optimal Substructure. Therefore, we can use an approach involving dynamic programming to efficiently solve this problem.

We keep a Two – Dimensional matrix DP[][] of length (N+1) x (M+1) where N and M are the lengths of two given Strings A and B. Any cell DP[i][j] will represent the maximum length of LCS of two substrings, that is, A[1...i] & B[1...j], where A and B are the two given Strings.

Base cases would be the problems whose solution is known without any further computation. In this approach,
DP[0][j] = 0 and DP[i][j] = 0 wherein both the cases atleast one string has length zero and in that case LCS would obviously be zero. Also to store this base case result, Length of DP matrix is one plus the length of given Strings.

The final result of this problem will be on the cell DP[N][M] as this cell represents the LCS of
strings A[1...N] & B[1...M], where A and B are the two given Strings and N & M are their lengths respectively.

Discussion of Optimal Substructure :

CASE-1 :

We know, if the first character of the current strings/ substrings matches then we make call for the shortened strings and add one to the LCS being calculated as the current characters match.
Similarly here in this DP approach, if A[i] = B[j] that is, character at ith position in A matches with character at jth position in B, then,

DP[i][j] = 1 + DP[i-1][j-1]

DP[i-1][j-1] is cell where the result of shortened string lies, that is, A[1…i-1] and B[1…j-1]


if the first character of the current strings/ substrings does not match then we made two separate calls in recursive approach and final length of LCS is the result of Maximum of those two calls.
Similarly, in this DP approach if A[i] != B[j] then,

DP[i][j] = max (DP[i-1][j] , DP[i][j-1])

DP[i-1][j] contains the maximum Length of LCS of A[1...i-1] and B[1...j] and,
DP[i][j-1] contains the maximum Length of LCS of A[1...i] and B[1...j-1].

Discussion of time complexity :

As we use a Matrix of size N X M where N and M are the lengths of the given two strings, and as we progress we calculate result for every cell.
That is why, the Worst time complexity of this approach for solving this problem would be O(NM).

That’s all about Longest common Subsequence in java.


Add Comment