Table of Contents
In this entire article, we will use the following content of three different text files to practice the provided solutions.
1 2 3 4 5 6 |
This is Line number 1. This is Line Number 2. It is Line number 3. I am Line 4. |
1 2 3 4 5 6 |
This is Line number 1. This is Line Number 2. It is Line number 3. I am Line 4. |
1 2 3 |
Greetings! Welcome to Java2Blog! |
Using cmp
Command
Use the cmp
command to check if two files are the same in Bash.
1 2 3 4 5 6 7 8 9 10 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File2.txt" if cmp -s "$file1" "$file2"; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are the same. |
In the above example, we initialized the file1
and file2
variables with the two different text file locations. Then, we used the if
statement with the cmp command to check if the content of $file1
and $file2
was the same.
If it was the same, then the echo
from the if
block was executed; otherwise, the else
block would be executed.
How did this command work? The cmp
command compared the $file1
and $file2
files byte by byte and helped to find whether the given two files were identical or not. Here, the -s
option (alternatively used as --silent
) suppressed the output of differences between $file1
and $file2
.
If the files were identical, the cmp
produced no results but if they differ, by default, the cmp
returns the byte offset and the line number where the first difference occurred.
Let’s take another example where the files are different.
1 2 3 4 5 6 7 8 9 10 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File3.txt" if cmp -s "$file1" "$file2"; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are different. |
Further reading:
Using Hash Values
Use sha1sum
to calculate hash values to check if two files are the same in Bash.
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File2.txt" hash1=$(sha1sum "$file1" | cut -d ' ' -f 1) hash2=$(sha1sum "$file2" | cut -d ' ' -f 1) if [ "$hash1" == "$hash2" ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are the same. |
Here, we used the sha1sum to generate a SHA1 hash for the $file1
and piped the output to the cut
command to grab the hash only. Because the sha1sum
returned the output as hash filepath
.
With the cut
command, we used the -d
option to specify the delimiter for splitting the input into fields, while -f 1
was used to select the first field, the hash, which we stored in the hash1
variable.
We repeated the process for getting a hash for $file2
and stored it in the hash2
variable. Then, we used the if
statement with the ==
operator to check if both hashes are the same. If so, then we ran the echo
inside the if
block to display a message saying both files are the same.
If you have an error saying something similar to the
bash: /home/user/File1.txt: Permission denied
then run thechmod u+x filename.txt
command on the bash console to allow permissions.
Let’s take another example below:
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File3.txt" hash1=$(sha1sum "$file1" | cut -d ' ' -f 1) hash2=$(sha1sum "$file2" | cut -d ' ' -f 1) if [ "$hash1" == "$hash2" ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are different. |
Similarly, you can use md5sum
and sha256sum
to calculate hash values to check if two files are the same in Bash.
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File2.txt" hash1=$(md5sum "$file1" | cut -d ' ' -f 1) hash2=$(md5sum "$file2" | cut -d ' ' -f 1) if [ "$hash1" == "$hash2" ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are the same. |
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File2.txt" hash1=$(sha256sum "$file1" | cut -d ' ' -f 1) hash2=$(sha256sum "$file2" | cut -d ' ' -f 1) if [ "$hash1" == "$hash2" ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are the same. |
Using -ef
Operator
Use the -ef
operator to check if two files are the same in Bash.
1 2 3 4 5 6 7 8 9 10 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File2.txt" if [ "$file1" -ef "$file2" ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are the same. |
Let’s take another example below:
1 2 3 4 5 6 7 8 9 10 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File3.txt" if [ "$file1" -ef "$file2" ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are different. |
Using diff
Command
Use the diff
command to check if two files are the same in Bash.
1 2 3 4 5 6 7 8 9 10 11 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File2.txt" diff "$file1" "$file2" > /dev/null if [ $? -eq 0 ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are the same. |
In the above example, we used diff command to compare the $file1
and $file2
line by line. We used the >
operator to redirect the standard output of the diff
command to the /dev/null
device, which would effectively discard the output. In other words, this line performs the comparison silently without displaying any anything on the console.
Next, we used the if
statement with the -eq
operator to check the exit status of the previously run diff
command. If the exit status was equal to 0
, it meant the specified files were the same; otherwise, different.
In Bash, the exit status of the previously executed command is stored in the
$?
variable. By convention, an0
exit code indicates successful execution (no errors), while a non-zero represents some error that occurred during execution.
Let’s have a look at another example below:
1 2 3 4 5 6 7 8 9 10 11 |
#!/bin/bash file1="/home/user/File1.txt" file2="/home/user/File3.txt" diff "$file1" "$file2" > /dev/null if [ $? -eq 0 ]; then echo "Both files are the same." else echo "Both files are different." fi |
1 2 3 |
Both files are different. |
Use the diff
command with the -q
option to check if the two files are the same. For this solution, you do not have to use if-else
statements. It will show nothing if the files will be the same.
1 2 3 4 |
#!/bin/bash diff -q /home/user/File1.txt /home/user/File3.txt |
1 2 3 |
Files /home/user/File1.txt and /home/user/File3.txt differ |
Use the diff
command with the -c
option to retrieve a comparison of two files in context mode.
1 2 3 4 |
#!/bin/bash diff -q /home/user/File1.txt /home/user/File3.txt |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
*** /home/user/File1.txt 2023-07-14 05:20:08.852722832 +0000 --- /home/user/File3.txt 2023-07-20 09:58:58.717339962 +0000 *************** *** 1,4 **** ! This is Line number 1. ! This is Line Number 2. ! It is Line number 3. ! I am Line 4. \ No newline at end of file --- 1 ---- ! Greetings! Welcome to Java2Blog! \ No newline at end of file |
In the above output, the *
character was used to display the File1.txt
related things, while the -
character was used to show File3
related things. In the first two lines of the output, we got file locations with date and time.
The *** 1,4 ****
represented the range of lines in File1.txt
. However, the --- 1 ---
indicated the number of lines in the File3.txt
. Remember, we would have a range of lines for File3.txt
as well if it would have multiple lines.
How did we identify the differences in the output?
- The
+
meant the line was not present in the first file. Remove it from the second file or insert it in the first file to match. - The
-
is similar to the+
with a little difference. It means the line existed in the first file but not in the second file. Remove it from the first file or insert it in the second file to match. - The
!
represented that the line requires modifications to match.
Use the diff
command with the -u
option; it is similar to the context mode but without redundant details.
1 2 3 4 |
#!/bin/bash diff -u /home/user/File1.txt /home/user/File3.txt |
1 2 3 4 5 6 7 8 9 10 11 12 |
--- /home/user/File1.txt 2023-07-14 05:20:08.852722832 +0000 +++ /home/user/File3.txt 2023-07-20 09:58:58.717339962 +0000 @@ -1,4 +1 @@ -This is Line number 1. -This is Line Number 2. -It is Line number 3. -I am Line 4. \ No newline at end of file +Greetings! Welcome to Java2Blog! \ No newline at end of file |
Let’s modify any letter’s case in any file. We modified the File1.txt
by replacing the This
with this
in line 2. Now, if we run the diff
command with the -q
option, then it will say the files differ; see the following example.
1 2 3 4 |
#!/bin/bash diff -q /home/user/File1.txt /home/user/File2.txt |
1 2 3 |
Files /home/user/File1.txt and /home/user/File2.txt differ |
What happened here? By default, the diff
command does case-sensitive comparison. As we modified the letter’s case, the files became different.
Use the diff
command with the -i
option for case-insensitive comparison.
1 2 3 4 |
#!/bin/bash diff -i /home/user/File1.txt /home/user/File2.txt |
Now, we got nothing because the File1.txt
and File2.txt
were found identical.
Can we have any solution which can open an editor for us to modify the files if there is some difference that we don’t need? Yes, see the following section.
If you are looking for a standard output in coloured form, then replace the
diff
withcolordiff
ascolordiff FirstFilePath SecondFilePath
. Don’t forget to installcolordiff
using thesudo apt install colordiff
command. Likediff
, it will only show the output if the specified files are different; otherwise, not.
Using vimdiff
Command
Use the vimdiff
command to compare two files in Bash. You must install Vim Editor using sudo apt install vim to use this solution.
1 2 3 4 |
#!/bin/bash vimdiff /home/user/File1.txt /home/user/File3.txt |
OUTPUT:
As you can see in the above screenshot, both files were opened side by side and the differences were highlighted.
Use
:wq
and hit Enter to exit from the file.
Using cksum
Command
Use the cksum
command to check if the two files are the same.
1 2 3 |
cksum /home/user/*.txt |
1 2 3 4 5 |
3232966274 79 /home/user/File1.txt 3232966274 79 /home/user/File2.txt 1972114799 66 /home/user/File3.txt |
The cksum command is used to produce a CRC (Cyclic Redundancy Check) checksum for all the files in the given directory and output the checksum values with respective file names/paths and byte count. It would grab the files from the current directory if any path was not specified.
We used the cksum
to get checksum values for all the .txt
files in the specified directory. If the checksum values are the same, the files are the same; otherwise, not.
Do we have a solution to remove the duplicate files if found? Can we see the history before deleting them? Yes, let’s go to the following section.
Using rdfind
Command
Use the rdfind
command with the -deleteduplicates
option to find duplicate files in the given directory and subdirectories, and delete them.
1 2 3 |
rdfind -deleteduplicates true /home/user |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Now scanning "/home/user", found 3 files Now have 3 files in total. Removed 0 files due to nonunique device and inode Total size is 193 bytes or 193 B Removed 1 files due to unique sizes from list. 2 files left. Now eliminating candidates based on first bytes:removed 0 files from list. 2 files left. Now eliminating candidates based on last bytes:removed 0 files from list. 2 files left. Now eliminating candidates based on sha1 checksum:removed 0 files from list. 2 files left. It seems like you have two files that are not unique Totally, 80 B can be reduced. Now, making results file results.txt Now deleting duplicates: Deleted 1 files. |
Using the
-dryrun
option before using-deleteduplicates
helps in previewing the results rather than deleting them immediately.
That’s all about Check if two files are same in Bash.