Check if Two Files are the Same in Bash

Bash Check if two files are same

In this entire article, we will use the following content of three different text files to practice the provided solutions.

Using cmp Command

Use the cmp command to check if two files are the same in Bash.

In the above example, we initialized the file1 and file2 variables with the two different text file locations. Then, we used the if statement with the cmp command to check if the content of $file1 and $file2 was the same.

If it was the same, then the echo from the if block was executed; otherwise, the else block would be executed.

How did this command work? The cmp command compared the $file1 and $file2 files byte by byte and helped to find whether the given two files were identical or not. Here, the -s option (alternatively used as --silent) suppressed the output of differences between $file1 and $file2.

If the files were identical, the cmp produced no results but if they differ, by default, the cmp returns the byte offset and the line number where the first difference occurred.

Let’s take another example where the files are different.

Using Hash Values

Use sha1sum to calculate hash values to check if two files are the same in Bash.

Here, we used the sha1sum to generate a SHA1 hash for the $file1 and piped the output to the cut command to grab the hash only. Because the sha1sum returned the output as hash filepath.

With the cut command, we used the -d option to specify the delimiter for splitting the input into fields, while -f 1 was used to select the first field, the hash, which we stored in the hash1 variable.

We repeated the process for getting a hash for $file2 and stored it in the hash2 variable. Then, we used the if statement with the == operator to check if both hashes are the same. If so, then we ran the echo inside the if block to display a message saying both files are the same.

If you have an error saying something similar to the bash: /home/user/File1.txt: Permission denied then run the chmod u+x filename.txt command on the bash console to allow permissions.

Let’s take another example below:

Similarly, you can use md5sum and sha256sum to calculate hash values to check if two files are the same in Bash.

Using -ef Operator

Use the -ef operator to check if two files are the same in Bash.

Let’s take another example below:

Using diff Command

Use the diff command to check if two files are the same in Bash.

In the above example, we used diff command to compare the $file1 and $file2 line by line. We used the > operator to redirect the standard output of the diff command to the /dev/null device, which would effectively discard the output. In other words, this line performs the comparison silently without displaying any anything on the console.

Next, we used the if statement with the -eq operator to check the exit status of the previously run diff command. If the exit status was equal to 0, it meant the specified files were the same; otherwise, different.

In Bash, the exit status of the previously executed command is stored in the $? variable. By convention, an 0 exit code indicates successful execution (no errors), while a non-zero represents some error that occurred during execution.

Let’s have a look at another example below:

Use the diff command with the -q option to check if the two files are the same. For this solution, you do not have to use if-else statements. It will show nothing if the files will be the same.

Use the diff command with the -c option to retrieve a comparison of two files in context mode.

In the above output, the * character was used to display the File1.txt related things, while the - character was used to show File3 related things. In the first two lines of the output, we got file locations with date and time.

The *** 1,4 **** represented the range of lines in File1.txt. However, the --- 1 --- indicated the number of lines in the File3.txt. Remember, we would have a range of lines for File3.txt as well if it would have multiple lines.

How did we identify the differences in the output?

  • The + meant the line was not present in the first file. Remove it from the second file or insert it in the first file to match.
  • The - is similar to the + with a little difference. It means the line existed in the first file but not in the second file. Remove it from the first file or insert it in the second file to match.
  • The ! represented that the line requires modifications to match.

Use the diff command with the -u option; it is similar to the context mode but without redundant details.

Let’s modify any letter’s case in any file. We modified the File1.txt by replacing the This with this in line 2. Now, if we run the diff command with the -q option, then it will say the files differ; see the following example.

What happened here? By default, the diff command does case-sensitive comparison. As we modified the letter’s case, the files became different.

Use the diff command with the -i option for case-insensitive comparison.

Now, we got nothing because the File1.txt and File2.txt were found identical.

Can we have any solution which can open an editor for us to modify the files if there is some difference that we don’t need? Yes, see the following section.

If you are looking for a standard output in coloured form, then replace the diff with colordiff as colordiff FirstFilePath SecondFilePath. Don’t forget to install colordiff using the sudo apt install colordiff command. Like diff, it will only show the output if the specified files are different; otherwise, not.

Using vimdiff Command

Use the vimdiff command to compare two files in Bash. You must install Vim Editor using sudo apt install vim to use this solution.

OUTPUT:

bash check if two files are the same - vimdiff

As you can see in the above screenshot, both files were opened side by side and the differences were highlighted.

Use :wq and hit Enter to exit from the file.

Using cksum Command

Use the cksum command to check if the two files are the same.

The cksum command is used to produce a CRC (Cyclic Redundancy Check) checksum for all the files in the given directory and output the checksum values with respective file names/paths and byte count. It would grab the files from the current directory if any path was not specified.

We used the cksum to get checksum values for all the .txt files in the specified directory. If the checksum values are the same, the files are the same; otherwise, not.

Do we have a solution to remove the duplicate files if found? Can we see the history before deleting them? Yes, let’s go to the following section.

Using rdfind Command

Use the rdfind command with the -deleteduplicates option to find duplicate files in the given directory and subdirectories, and delete them.

Using the -dryrun option before using -deleteduplicates helps in previewing the results rather than deleting them immediately.

That’s all about Check if two files are same in Bash.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *