Merge Multiple CSV Files in Python [2 Ways]

Table of Contents

Using pd.concat() Method
- - Setting Up the Working Directory
Using pd.merge() Method

Using `pd.concat()` Method

To merge multiple CSV files in Python:

Create a list containing all CSV files we will merge.
Create an empty dataframe.
Use the for loop to iterate over the files.
In each iterate, use read_csv() to read CSV files and concatenate using the pd.concat() method.
Use to_csv() to write merged data into a new CSV file.


import pandas as pd
files = ['file1.csv', 'file2.csv']
df = pd.DataFrame()
for file in files:
    data = pd.read_csv(file)
    df = pd.concat([df, data], axis=0)
df.to_csv('merged_files.csv', index=False)

import pandas as pd

files = ['file1.csv', 'file2.csv']

df = pd.DataFrame()

for file in files:

data = pd.read_csv(file)

df = pd.concat([df, data], axis=0)

df.to_csv('merged_files.csv', index=False)

Let’s first see content of file1.csv and file2.csv to understand output of above code.


ID,FirstName,LastName
1,Sara,Ali
2,John,Johnas
3,Nayya,Daniel

ID,FirstName,LastName

1,Sara,Ali

2,John,Johnas

3,Nayya,Daniel


ID,FirstName,LastName
1,Sara,Ahmed
5,Daniel,Christopher

ID,FirstName,LastName

1,Sara,Ahmed

5,Daniel,Christopher


ID,FirstName,LastName
1,Sara,Ali
2,John,Johnas
3,Nayya,Daniel
1,Sara,Ahmed
5,Daniel,Christopher

ID,FirstName,LastName

1,Sara,Ali

2,John,Johnas

3,Nayya,Daniel

1,Sara,Ahmed

5,Daniel,Christopher

We created a list having all CSV files; having a list is useful when we are required to merge multiple CSV files. We created an empty DataFrame to contain the data returned by the read_csv() method, which took a file’s name as an argument in every iteration of the for loop.

Also, we used the .concat() method in each iteration of the for loop to concatenate the data into the DataFrame df. Finally, we used the to_csv() file to write the new DataFrame to merged.files.csv.

NOTE: In the previous pandas versions, we could use the df.append() method instead of pd.concat(), but that is deprecated since version 1.4.0. So, we can use the pd.concat() or the df.merge() method.

You may have observed in the above-provided solutions that we either read the CSV file one by one or create a list with all CSV files that we read using the for loop. If you are looking for a solution to set up your working directory, you can choose one of the following methods.

Setting Up the Working Directory

We can set up the working directory by using the following three ways.


import os
import pandas as pd
file_path = "C:\Downloads\Merge_Files\csv_files"
file_list = os.listdir(file_path)

import os

import pandas as pd

file_path = "C:\Downloads\Merge_Files\csv_files"

file_list = os.listdir(file_path)

Here, we gave the directory’s path into a variable, which lists every file. For example, in file_path, we passed the CSV files path, and in the file_list, we listed all the files in a directory by providing the file path parameter.


import os
import pandas as pd
cwd = os.path.abspath('')
file_list = os.listdir(cwd)

import os

import pandas as pd

cwd = os.path.abspath('')

file_list = os.listdir(cwd)

We can use the code above to read files from the same directory as your .ipynb file; an IPYNB file is a notebook document created by Jupyter Notebook, an interactive computational environment that helps scientists manipulate and analyze data using Python.


import os
import glob
import pandas as pd
csv_files = glob.glob('*.{}'.format('csv'))

import os

import glob

import pandas as pd

csv_files = glob.glob('*.{}'.format('csv'))

The glob() function can only list the CSV files in the working directory given above. The glob module in Python finds files and pathnames that match a given pattern. ‘glob`’s pattern rules follow standard Unix path expansion rules.

Additionally, it is projected that benchmarks show it to be quicker than previous ways to match pathnames in directories. With glob, we used wildcard (*) apart from the exact string search to make path retrieval more simple and convenient.

Using `pd.merge()` Method

To merge multiple CSV files in Python:

Use pd.read_csv() to read all CSV files one by one.
Use the pd.merge() method to merge all the CSV files.


import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
merged_df = pd.merge(df1, df2, on='ID', how='inner')
merged_df.to_csv('merged_files.csv', index=False)

import pandas as pd

df1 = pd.read_csv('file1.csv')

df2 = pd.read_csv('file2.csv')

merged_df = pd.merge(df1, df2, on='ID', how='inner')

merged_df.to_csv('merged_files.csv', index=False)


ID,FirstName_x,LastName_x,FirstName_y,LastName_y
1,Sara,Ali,Sara,Ahmed

ID,FirstName_x,LastName_x,FirstName_y,LastName_y

1,Sara,Ali,Sara,Ahmed

After importing the pandas library, we used its alias pd to avoid keystrokes. Then, we used the .read_csv() method to read the specified CSV file (file1.csv) and saved it into a data frame called df1. Similarly, we read file2.csv, but this time we saved it to another DataFrame df2. Next, we used the .merge() method of the pandas library to merge both DataFrames based on the ID column.

The how parameter is used to specify the type of merge to be performed. In this case, the how='inner' denotes that only the rows with matching values in the ID column will be included in the merged DataFrame.

Next, we used the merged_df.to_csv('merged_files.csv', index=False) method to write the merged DataFrame to a new CSV file named merged_files.csv and the index=False option used to prevent the DataFrame index from being saved in the CSV file.

It’s worth noting that in the case of different column names in both data frames, you can use left_on and right_on to specify which columns to merge on.

That’s all about how to merge multiple CSV files in Python.

Was this post helpful?

Let us know if this post was helpful. Feedbacks are monitored on daily basis. Please do provide feedback as that\'s the only way to improve.

Merge Multiple CSV Files in Python

Using `pd.concat()` Method

Setting Up the Working Directory

Using `pd.merge()` Method

Was this post helpful?

Author

Leave a Reply Cancel reply

Categories

Popular Posts

Let’s be Friends

Using pd.concat() Method

Setting Up the Working Directory

Using pd.merge() Method

Was this post helpful?

Related posts:

Share this

Author

Leave a Reply Cancel reply

Let’s be Friends

Using `pd.concat()` Method

Using `pd.merge()` Method