Table of Contents
Bash Sort CSV by Column
Using sort
Command
Use bash’s sort
command to sort the CSV file by column.
1 2 3 4 5 6 7 |
Name,Age,City John,25,New York Alice,32,San Francisco Michael,28,Chicago Emily,22,Los Angeles |
1 2 3 4 |
#!/bin/bash sort -t',' -k2 myFile.csv |
1 2 3 4 5 6 7 |
Emily,22,Los Angeles John,25,New York Michael,28,Chicago Alice,32,San Francisco Name,Age,City |
In this example, the sort
command is used to sort the myFile.csv
file by column. Here, the -t
parameter is used to specify the field separator, which is ,
in the above case. Then, the -k
option indicates the column to sort by.
In this case, -k2
represented the second column (Age
, which contains numeric values) of the given CSV file that needs to be sorted. We can observe, after the execution of the above command, the Age
column is sorted in ascending order.
It should be noted that the header
of the file is moved at the end because there is no numeric value. Now, see the example below to understand how to sort the file by skipping the header.
1 2 3 4 |
#!/bin/bash { head -n 1 myFile.csv; tail -n +2 myFile.csv | sort -t',' -k2; } |
1 2 3 4 5 6 7 |
Name,Age,City Emily,22,Los Angeles John,25,New York Michael,28,Chicago Alice,32,San Francisco |
In the above example, the head
command first read the myFile.csv
file and extracted the first line (a header). Here, the -n
option is used to represent the line number which is 1
in this case. After that, the tail
command is used to skip the first line (header) from the sorting processing; its output is then passed to the sort command to sort the second column of the file.
In case you want to sort CSV column by descending order, you can use -r
option.
1 2 3 4 |
#!/bin/bash { head -n 1 myFile.csv; tail -n +2 myFile.csv | sort -t',' -r -k2; } |
1 2 3 4 5 6 7 |
Name,Age,City Alice,32,San Francisco Michael,28,Chicago John,25,New York Emily,22,Los Angeles |
Have a look at another example to sort non-numeric columns:
1 2 3 4 |
#!/bin/bash { head -n 1 myFile.csv; tail -n +2 myFile.csv | sort -t',' -k1; } |
1 2 3 4 5 6 7 |
Name,Age,City Alice,32,San Francisco Emily,22,Los Angeles John,25,New York Michael,28,Chicago |
This example is the same as above; we only replaced the column number k2
with k1
because we wanted to sort the non-numeric Name
column alphabetically.
Note that the original file remains unchanged. The sorted output is displayed on the console by default. If you want to save the sorted columns in a file, check the below example:
1 2 3 4 |
#!/bin/bash { head -n 1 myFile.csv; tail -n +2 myFile.csv | sort -t',' -k2; } > sortedFile.csv |
This command is the same as above, but it redirected the output to the sortedFile.csv
file using the>
operator. If the file does not exist, it will be created. If the file already exists, it will be overwritten with the new sorted output.
To verify, let’s check the content of sortedFile.csv
.
1 2 3 |
cat sortedFile.csv |
1 2 3 4 5 6 7 |
Name,Age,City Emily,22,Los Angeles John,25,New York Michael,28,Chicago Alice,32,San Francisco |
Using csvsort
Utility
Use the csvsort
utility in bash to sort the CSV file by column.
1 2 3 4 5 6 7 |
Name,Age,City John,25,New York Alice,32,San Francisco Michael,28,Chicago Emily,22,Los Angeles |
1 2 3 |
csvsort -c 2 myFile.csv > sortedFile.csv |
This example used the csvsort
utility with the -c
option to sort the myFile.csv
file by column. Here, c
indicates the column number, 2
, in the above case. Then, the sorted output is redirected to the sortedFile.csv
.
Let’s check the sortedFile.csv
content to see the sorted column.
1 2 3 |
cat sortedFile.csv |
1 2 3 4 5 6 7 |
Name,Age,City Emily,22,Los Angeles John,25,New York Michael,28,Chicago Alice,32,San Francisco |
We can observe that column number 2
(Age
) is sorted.
Please note that csvsort
treats the first row as the header row by default and preserves it in the output. If your CSV file doesn’t have a header row, use the --no-header-row
option with csvsort
to indicate no header row. Otherwise, it will consider the first row as a header.
Here’s an example command to sort a CSV file without a header row:
1 2 3 |
csvsort -c 2 my.csv --no-header-row |
1 2 3 4 5 6 7 |
a,b,c Emily,22,Los Angeles John,25,New York Michael,28,Chicago Alice,32,San Francisco |
The above command sorted the file by column 2
, treating the first row as data instead of a header row. Here, a
, b
and c
in the first row represent the header column.
To use
csvsort
, you must installcsvkit
. Run thepip install csvkit
command to install it via pip if it is not installed on your system. Once it is installed, you will be able to use thecsvsort
utility.
Bash Sort CSV by Multiple Columns
In case, you want to sort CSV with multiple columns, you can use sort
command with -k
option and pass comma separated column numbers to sort it.
Let’s change content of the file as below:
1 2 3 4 5 6 7 |
Name,Age,City John,25,New York Alice,32,San Francisco Michael,28,Chicago Alice,24,Los Angeles |
1 2 3 4 |
#!/bin/bash { head -n 1 myFile.csv; tail -n +2 myFile.csv | sort -t',' -k1,2; } |
1 2 3 4 5 6 7 |
Name,Age,City Alice,24,Los Angeles Alice,32,San Francisco John,25,New York Michael,28,Chicago |
As you can see, since we provided -k1,2
option, it first sorted based on name(1st column) and if names were same, then it sorted based on age(2nd column).
That’s all about how to Sort CSV by Column in Bash.