In this post, we will see how to drop rows in Pandas.
Table of Contents
Syntax of DataFrame.drop()
1 2 3 |
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') |
labels:
index or columns to remove.axis:
axis=0
is used to delete rows and axis=1
is used to delete columns. For this post, we will use axis=0 to delete rows. Since axis=0
is the default value, we can ignore this attribute.columns:
It is an alternative to labels and uses to drop columns(Introduced in version 0.21).index:
It is an alternative to labels and uses to drop indices(Introduced in version 0.21).inplace:
If False, it won’t modify the original DataFrame.
Pandas Drop rows based on index
You can specify index labels to drop rows.
Delete single row
Here is an example:
1 2 3 4 5 6 7 8 9 10 11 |
import pandas as pd dic = {'Name': ['India','China','Bhutan','Russia'], "Population": [20000,40000,1000,10000]} Country_df = pd.DataFrame(dic,index = ['One','Two','Three','Four']) print("-------Original Dataframe-------\n",Country_df) #Drop index Two Country_df = Country_df.drop(labels='Two') print("-------Changed Dataframe-------\n",Country_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-------Original Dataframe------- Name Population One India 20000 Two China 40000 Three Bhutan 1000 Four Russia 10000 -------Changed Dataframe------- Name Population One India 20000 Three Bhutan 1000 Four Russia 10000 |
As you can see, row with index Two
got dropped from Pandas DataFrame.
Delete multiple rows
Change highlight line to delete Three and Four indices.
1 2 3 4 |
#Drop indices Three and Four Country_df = Country_df.drop(labels=['Three','Four']) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 |
-------Original Dataframe------- Name Population One India 20000 Two China 40000 Three Bhutan 1000 Four Russia 10000 -------Changed Dataframe------- Name Population One India 20000 Two China 40000 |
As you can see, rows with index Three
and Four
got dropped from Pandas DataFrame.
In case, you want to modify original DataFrame you can pass inplace=True
1 2 3 4 |
#Drop indices Three and Four Country_df.drop(labels=['Three','Four'],inPlace=True) |
As you can see, you don’t have to reassign Country_df
now.
Pandas Drop rows with conditions
You can also drop rows based on certain conditions.
Here is an example:
Let’s say you want to delete all the rows for which the population is less than or equal to 10000.
You can get index of all such rows by putting conditions and pass it to drop() method.
1 2 3 4 5 6 7 8 9 10 11 12 |
import pandas as pd dic = {'Name': ['India','China','Bhutan','Russia'], "Population": [20000,40000,1000,10000]} Country_df = pd.DataFrame(dic,index = ['One','Two','Three','Four']) print("-------Original Dataframe-------\n",Country_df) #Delete all the rows whose population is less than 10000 Country_df.drop(Country_df[Country_df['Population']<=10000].index,inplace=True) print("-------Changed Dataframe-------\n",Country_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 |
-------Original Dataframe------- Name Population One India 20000 Two China 40000 Three Bhutan 1000 Four Russia 10000 -------Changed Dataframe------- Name Population One India 20000 Two China 40000 |
Pandas Drop rows with NaN
You can drop values with NaN
rows using dropna()
method.
Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import numpy as np import pandas as pd dic = {'Name': ['India','China','Bhutan','Russia'], "Population": ['NaN',40000,'NaN',10000]} Country_df = pd.DataFrame(dic,index = ['One','Two','Three','Four']) Country_df =Country_df.replace('NaN',np.NaN) print("-------Original Dataframe-------\n",Country_df) Country_df = Country_df.dropna() print("-------Changed Dataframe-------\n",Country_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 |
-------Original Dataframe------- Name Population One India NaN Two China 40000.0 Three Bhutan NaN Four Russia 10000.0 -------Changed Dataframe------- Name Population Two China 40000.0 Four Russia 10000.0 |
As you can see, rows that contain NaN were dropped from the Pandas DataFrame.
Pandas Drop duplicate rows
You can drop duplicate rows with DataFrame.drop_duplicates()
method.
Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 |
import pandas as pd dic = {'Name': ['India','China','India','Russia'], "Population": [20000,40000,20000,10000]} Country_df = pd.DataFrame(dic,index = ['One','Two','Three','Four']) print("-------Original Dataframe-------\n",Country_df) Country_df = Country_df.drop_duplicates() Country_df = Country_df.dropna() print("-------Changed Dataframe-------\n",Country_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-------Original Dataframe------- Name Population One India 20000 Two China 40000 Three India 20000 Four Russia 10000 -------Changed Dataframe------- Name Population One India 20000 Two China 40000 Four Russia 10000 |
As you can see, rows that contain duplicate data were dropped from the Pandas DataFrame.
That’s all about How to drop rows in Pandas.