In this post, we will see how to filter Pandas by column value.
You can slice and dice Pandas Dataframe in multiple ways.
Table of Contents
Sometimes, you may want to find a subset of data based on certain column values. You can filter rows by one or more columns value to remove non-essential data.
Pandas DataFrame sample data
Here is sample Employee data which will be used in below examples:
Name | Age | Gender |
---|---|---|
Ravi | 28 | Male |
Michelle | 21 | Female |
Mary | 37 | Female |
Sunita | 17 | Female |
Sam | 21 | Male |
Filter rows on the basis of single column data
You can use boolean expression to filter rows on the basis of column value. You can create boolean expression based on column of interest and use this variable to filter data.
For example:
Let’s say you want to filter all the employees whose age is 21.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import pandas as pd emp_df = pd.DataFrame({'Name': ['Ravi','Michelle','Mary','Sunita','Sam'], "Age": [28,21,37,17,21], "Gender":['Male','Female','Female','Female','Male']}) print("-------Original Dataframe-------\n",emp_df) # Create boolean expression isTwentyOne = emp_df['Age']==21 #Use boolean expression to filter the data emp_df = emp_df[isTwentyOne] print("-------Original Dataframe-------\n",emp_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-------Original Dataframe------- Name Age Gender 0 Ravi 28 Male 1 Michelle 21 Female 2 Mary 37 Female 3 Sunita 17 Female 4 Sam 21 Male -------Original Dataframe------- Name Age Gender 1 Michelle 21 Female 4 Sam 21 Male |
As you can see, we got the data filtered by Age = 21
In case, you want to reset index of the filtered dataframe, you can call reset_index() function of the DataFrame.
You can add below line to preceding code and you will get below output:
1 2 3 |
emp_df.reset_index(drop=True, inplace=True) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-------Original Dataframe------- Name Age Gender 0 Ravi 28 Male 1 Michelle 21 Female 2 Mary 37 Female 3 Sunita 17 Female 4 Sam 21 Male -------Original Dataframe------- Name Age Gender 0 Michelle 21 Female 1 Sam 21 Male |
💡 Did you know?
You can also use Python chaining to filter rows based on the condition. Python chaining makes it easier to mix one command with another. You can access DataFrame column by
DataFrame.columnName
.
For example:
You can access DataFrame columnAge
byemp_df.Age
and can apply condition on this.
1234 # Create boolean expressionisTwentyOne = emp_df.Age==21
Filter rows on the basis of multiple columns data
You can filter rows using multiple columns data.
Here is an example:
Let’s say you want find employees whose age is greater than 21 and also Male
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import pandas as pd emp_df = pd.DataFrame({'Name': ['Ravi','Michelle','Mary','Sunita','Sam'], 'Age': [28,21,37,17,21], 'Gender':['Male','Female','Female','Female','Male']}) print("-------Original Dataframe-------\n",emp_df) # Create boolean expression isTwentyOneAndMale = (emp_df['Age']==21) & (emp_df['Gender']=='Male') #Use boolean expression to filter the data emp_df = emp_df[isTwentyOneAndMale] print("-------Updated Dataframe-------\n",emp_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 |
-------Original Dataframe------- Name Age Gender 0 Ravi 28 Male 1 Michelle 21 Female 2 Mary 37 Female 3 Sunita 17 Female 4 Sam 21 Male -------Updated Dataframe------- Name Age Gender 4 Sam 21 Male |
Here is the diagram to illustrate the filter conditions.
Filter rows on the basis of list of values
You can also filter DataFrames by putting condition on the list of values.
Let’s say you want to filter employees DataFrame based on list of Names. If Name is in the list, then include that row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import pandas as pd emp_df = pd.Data<a href="https://java2blog.com/wp-content/uploads/2020/05/PandasDataFrameFilterGender.svg"><img src="https://java2blog.com/wp-content/uploads/2020/05/PandasDataFrameFilterGender.svg" alt="PandasDataFrameFilterGender" width="900" height="700" class="aligncenter size-large wp-image-10176" /></a> Frame({'Name': ['Ravi','Michelle','Mary','Sunita','Sam'], 'Age': [28,21,37,17,21], 'Gender':['Male','Female','Female','Female','Male']}) print("-------Original Dataframe-------\n",emp_df) # Create boolean expression isInNameList = emp_df['Name'].isin(['Michelle','Sunita']) #Use boolean expression to filter the data emp_df = emp_df[isInNameList] print("-------Updated Dataframe-------\n",emp_df) print("-------Updated Dataframe-------\n",emp_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-------Original Dataframe------- Name Age Gender 0 Ravi 28 Male 1 Michelle 21 Female 2 Mary 37 Female 3 Sunita 17 Female 4 Sam 21 Male -------Updated Dataframe------- Name Age Gender 1 Michelle 21 Female 3 Sunita 17 Female |
As you can see, we have rows for which Name column is matched with value in the Name list.
Filter rows on the basis of values not in the list
You can also filter DataFrames by putting condition on the values not in the list. You can use tilda(~)
to denote negation.
Let’s say you want to filter employees DataFrame based Names not present in the list. If Name is not in the list, then include that row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import pandas as pd emp_df = pd.DataFrame({'Name': ['Ravi','Michelle','Mary','Sunita','Sam'], 'Age': [28,21,37,17,21], 'Gender':['Male','Female','Female','Female','Male']}) print("-------Original Dataframe-------\n",emp_df) # Create boolean expression isNotInNameList = ~emp_df['Name'].isin(['Michelle','Sunita']) #Use boolean expression to filter the data emp_df = emp_df[isNotInNameList] print("-------Updated Dataframe-------\n",emp_df) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
-------Original Dataframe------- Name Age Gender 0 Ravi 28 Male 1 Michelle 21 Female 2 Mary 37 Female 3 Sunita 17 Female 4 Sam 21 Male -------Updated Dataframe------- Name Age Gender 0 Ravi 28 Male 2 Mary 37 Female 4 Sam 21 Male |
As you can see, we have rows for which Name column is not matched with value in the Name list.
That’s all about Filter Pandas Dataframe by column value.