Table of Contents
If you want to practice data structure and algorithm programs, you can go through Java coding interview questions.
In this post, we will see edit distance problem in java. Sometimes this problem also referred as "String distance replacement in java"
Problem
Given two strings string1 and string2, String1 is to be converted into String2 with the given operations available in the minimum number of steps. Using any one of the given operations contributes to the increment of steps by one.
Allowed Operations are :
(i) Remove
: This operation allows the Removal any one character from String.
(ii) Insert
: This operation allows the Insertion of one character at any spot in the String.
(iii) Replace
: This operation allows the replacement of any one character in the string with
any other character.
cabddb
Solution
Naive Approach :
We will process the strings from start, and move towards the end solving the problem for the current character.
- If the current character of the strings match, we don’t do any operation to make them equal and we recur for the remaining Strings.
- If the current character of the strings does not match, we make three calls for the three operations, add one to the total number of operations took to make the Strings equal and recur for remaining subproblem depending on the type of operation, i.e.
- For explaining let us define remaining String as the substring of the actual string which does not include the first character i.e remaining string = substring(1, string.length).
(i) for removal : we recur for remaining_String1 and String2.
(ii) for insertion : we recur for String1 and remaining_String2.
(iii) for replacement : we recur for remaining_String1 and remaining_String2.
As at every point we are making 3 calls when character is different, and one call when character matches, Considering the worst case, let’s assume that no character in either string matches, then at every character we need to make 3 calls.
let’s define ‘n’ as the minimum(string1.length(), string2.length()).
therefore, the worst time complexity of the algorithm would be O(3^n).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
import java.util.Scanner; public class EditDistance { public static void main(String[] args) { Scanner scn = new Scanner(System.in); String s1 = scn.nextLine(); String s2 = scn.nextLine(); System.out.println(editDistRecusrsive(s1, s2)); } public static int editDistRecusrsive(String s1, String s2) { /* * basecase : if one String is exhausted, return the remaining * length of the other String, as we have to do atleast that * many opeartions to make the Strings equal. */ if (s1.length() == 0) { return s2.length(); } if (s2.length() == 0) { return s1.length(); } /* * we are working on the first characters of both * strings and recurring for the remaining length * of the strings. */ String remaining1 = s1.substring(1); String remaining2 = s2.substring(1); int recRes = 0; if (s1.charAt(0) == s2.charAt(0)) { /* if characters match we dont need to do any * operations to make them equal, so leave them * and recur for the remaining strings */ recRes = editDistRecusrsive(remaining1, remaining1); } else { /* * if the first characters dont match, then we add +1 * for the operation used in making the first characters * of the strings equal, and recur for the remaining Strings * and our final answer will be 1 + minimum of the operations * taken to convert remaining String! and remaining String2. */ recRes = 1 + Math.min(editDistRecusrsive(remaining1, remaining2), Math.min(editDistRecusrsive(remaining1, s2), editDistRecusrsive(s1, remaining2))); } return recRes; } |
EFFICIENT APPROACH :
We can definitely see overlapping subproblems in the recursive solution where the result for the same problem is being calculated again and again which increases the complexity to such a higher extent. Also the result for the bigger problem is being calculated using the optimal result of its smaller subproblems which we can see in the code of our previous algorithm. As this algorithm follow both the required properties of Dynamic programming, that is, Optimal substructure and Overlapping Subproblems, hence, this problem points to the use of Dynamic programming to reduce the complexity of the algorithm, where the result for a duplicate call will not be calculated again and again.
Algorithm:
- We make a 2D DP array for storing our result for the subproblems.
- Lets name it DP and its size will be
(string1.length + 1)*(string2.length + 1)
- Any cell(i, j) will contain the result for the the number of operations needed to make the two strings
(String1.substring(i, string1.length()))&(String2.substring(j, string2.length()))
matching. - The last Row of the matrix will contain the result of :
(String1.substring(current column, string1.length) & empty string2
- The last column of the matrix will contain the result of :
empty string1 & (String1.substring(current column, string1.length))
The last two points actually are the base cases of our problem and we move backward to calculate the result of the bigger subproblem,
How?
Suppose we are calculating the result of a cell, DP[row][col]
- Now if the character of String1 at current column matches the character of String2 at current row,
DP[row][col] = DP[row + 1][col + 1]
This is basically equivalent to the call for the result of (remaining_String1, remaining String2), as the current character matches, so we dont need to do any operations for making them equal. - What if the character of String1 at current column does not matches the character of String2 at current row,
then the result for the current cell will be,DP[row][col] = min of(DP[row + 1][col], DP[row][col+1], DP[row + 1][col + 1])
which is equivalent to Minimum of call for all three operations, that is,
(i) for removal : DP[row][col + 1]
, that is, remaining_String1 and String2
(ii) for insertion : DP[row + 1][col]
, that is, String1 and remaining_String2
(iii) for replacement : DP[row+1][col+1]
, that is, remaining_String1 and remaining_String2.
Eventually the result i.e. the minimum number of operations required for making both the strings matching is at the cell, DP[0][0]
which represents the result of both COMPLETE strings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
package org.arpit.java2blog; import java.util.Scanner; public class EditDistance { public static void main(String[] args) { Scanner scn = new Scanner(System.in); String s1 = scn.nextLine(); String s2 = scn.nextLine(); System.out.println(editDist(s1, s2)); } public static int editDist(String s1, String s2) { int[][] dp = new int[s1.length() + 1][s2.length() + 1]; /* this is the last column of the matrix which * represents the result of empty string1 * and string2 starting from current row. */ for (int row = s2.length(); row >= 0; row--) { dp[row][s1.length()] = s2.length() - row; } /* this is the last row of the matrix which * represents the result of empty string2 * and string1 starting from current col. */ for (int col = s1.length(); col >= 0; col--) { dp[s2.length()][col] = s1.length() - col; } for (int col = s1.length() - 1; col >= 0; col--) { for (int row = s2.length() - 1; row >= 0; row--) { /* If characters are same, then the solution will be without * these characters */ if (s1.charAt(col) == s2.charAt(row)) { dp[row][col] = dp[row + 1][col + 1]; } else { /* else it will be minimum of these three operations */ dp[row][col] = 1 + Math.min(dp[row + 1][col + 1], // replace Math.min(dp[row][col + 1] // removal , dp[row + 1][col])); // insertion } } } return dp[0][0]; } } |
That’s all about Edit Distance Problem in java or String distance replacement in java.