Edit Distance Problem in java

Previous
Next

In this post, we will see edit distance problem in java


Problem

Given two strings string1 and string2, String1 is to be converted into String2 with the given operations available in the minimum number of steps. Using any one of the given operations contributes to the increment of steps by one.

Allowed Operations are :
(i) Remove : This operation allows the Removal any one character from String.
(ii) Insert : This operation allows the Insertion of one character at any spot in the String.
(iii) Replace : This operation allows the replacement of any one character in the string with
any other character.

Input:
bacbbd
cabddb
Output: 4

Solution

Naive Approach :

We will process the strings from start, and move towards the end solving the problem for the current character.

  • If the current character of the strings match, we don’t do any operation to make them equal and we recur for the remaining Strings.
  • If the current character of the strings does not match, we make three calls for the three operations, add one to the total number of operations took to make the Strings equal and recur for remaining subproblem depending on the type of operation, i.e.
  • For explaining let us define remaining String as the substring of the actual string which does not include the first character i.e remaining string = substring(1, string.length).
    (i) for removal : we recur for remaining_String1 and String2.
    (ii) for insertion : we recur for String1 and remaining_String2.
    (iii) for replacement : we recur for remaining_String1 and remaining_String2.

As at every point we are making 3 calls when character is different, and one call when character matches, Considering the worst case, let’s assume that no character in either string matches, then at every character we need to make 3 calls.
let’s define ‘n’ as the minimum(string1.length(), string2.length()).
therefore, the worst time complexity of the algorithm would be O(3^n).

EFFICIENT APPROACH :
We can definitely see overlapping subproblems in the recursive solution where the result for the same problem is being calculated again and again which increases the complexity to such a higher extent. Also the result for the bigger problem is being calculated using the optimal result of its smaller subproblems which we can see in the code of our previous algorithm. As this algorithm follow both the required properties of Dynamic programming, that is, Optimal substructure and Overlapping Subproblems, hence, this problem points to the use of Dynamic programming to reduce the complexity of the algorithm, where the result for a duplicate call will not be calculated again and again.

Algorithm:

  • We make a 2D DP array for storing our result for the subproblems.
  • Lets name it DP and its size will be (string1.length + 1)*(string2.length + 1)
  • Any cell(i, j) will contain the result for the the number of operations needed to make the two strings  (String1.substring(i, string1.length()))&(String2.substring(j, string2.length())) matching.
  • The last Row of the matrix will contain the result of : (String1.substring(current column, string1.length) & empty string2
  • The last column of the matrix will contain the result of : empty string1 & (String1.substring(current column, string1.length))

The last two points actually are the base cases of our problem and we move backward to calculate the result of the bigger subproblem,

How?

Suppose we are calculating the result of a cell, DP[row][col]

  • Now if the character of String1 at current column matches the character of String2 at current row, DP[row][col] = DP[row + 1][col + 1]
    This is basically equivalent to the call for the result of (remaining_String1, remaining String2), as the current character matches, so we dont need to do any operations for making them equal.
  • What if the character of String1 at current column does not matches the character of String2 at current row,
    then the result for the current cell will be, DP[row][col] = min of(DP[row + 1][col], DP[row][col+1], DP[row + 1][col + 1])         which is equivalent to Minimum of call for all three operations, that is,

(i)  for removal : DP[row][col + 1], that is, remaining_String1 and String2
(ii)  for insertion : DP[row + 1][col], that is, String1 and remaining_String2
(iii) for replacement : DP[row+1][col+1], that is, remaining_String1 and remaining_String2.

Eventually the result i.e. the minimum number of operations required for making both the strings matching is at the cell, DP[0][0] which represents the result of both COMPLETE strings.

That’s all about Edit Distance Problem in java.

Latest posts by Ayush Kumar (see all)

Previous
Next

Join Our News Letter – Stay Updated

Subscribe to Awesome Java Content.




Add Comment

Join Our News Letter - Stay Updated

Subscribe to Awesome Java Content.
You can like our facebook page Java2blog