Merge Multiple CSV Files in Python

Merge multiple CSV Files in Python

Using pd.concat() Method

To merge multiple CSV files in Python:

  • Create a list containing all CSV files we will merge.
  • Create an empty dataframe.
  • Use the for loop to iterate over the files.
  • In each iterate, use read_csv() to read CSV files and concatenate using the pd.concat() method.
  • Use to_csv() to write merged data into a new CSV file.

Let’s first see content of file1.csv and file2.csv to understand output of above code.

We created a list having all CSV files; having a list is useful when we are required to merge multiple CSV files. We created an empty DataFrame to contain the data returned by the read_csv() method, which took a file’s name as an argument in every iteration of the for loop.

Also, we used the .concat() method in each iteration of the for loop to concatenate the data into the DataFrame df. Finally, we used the to_csv() file to write the new DataFrame to merged.files.csv.

NOTE: In the previous pandas versions, we could use the df.append() method instead of pd.concat(), but that is deprecated since version 1.4.0. So, we can use the pd.concat() or the df.merge() method.

You may have observed in the above-provided solutions that we either read the CSV file one by one or create a list with all CSV files that we read using the for loop. If you are looking for a solution to set up your working directory, you can choose one of the following methods.

Setting Up the Working Directory

We can set up the working directory by using the following three ways.

Here, we gave the directory’s path into a variable, which lists every file. For example, in file_path, we passed the CSV files path, and in the file_list, we listed all the files in a directory by providing the file path parameter.

We can use the code above to read files from the same directory as your .ipynb file; an IPYNB file is a notebook document created by Jupyter Notebook, an interactive computational environment that helps scientists manipulate and analyze data using Python.

The glob() function can only list the CSV files in the working directory given above. The glob module in Python finds files and pathnames that match a given pattern. ‘glob`’s pattern rules follow standard Unix path expansion rules.

Additionally, it is projected that benchmarks show it to be quicker than previous ways to match pathnames in directories. With glob, we used wildcard (*) apart from the exact string search to make path retrieval more simple and convenient.

Using pd.merge() Method

To merge multiple CSV files in Python:

  • Use pd.read_csv() to read all CSV files one by one.
  • Use the pd.merge() method to merge all the CSV files.

After importing the pandas library, we used its alias pd to avoid keystrokes. Then, we used the .read_csv() method to read the specified CSV file (file1.csv) and saved it into a data frame called df1. Similarly, we read file2.csv, but this time we saved it to another DataFrame df2. Next, we used the .merge() method of the pandas library to merge both DataFrames based on the ID column.

The how parameter is used to specify the type of merge to be performed. In this case, the how='inner' denotes that only the rows with matching values in the ID column will be included in the merged DataFrame.

Next, we used the merged_df.to_csv('merged_files.csv', index=False) method to write the merged DataFrame to a new CSV file named merged_files.csv and the index=False option used to prevent the DataFrame index from being saved in the CSV file.

It’s worth noting that in the case of different column names in both data frames, you can use left_on and right_on to specify which columns to merge on.

That’s all about how to merge multiple CSV files in Python.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *