Table of Contents
We make use of the Pandas dataframe
to store data in an organized and tabular manner. Sometimes there, is a need to apply a function over a specific column or the whole table of the stored data.
This tutorial demonstrates the different methods available to apply a function to a column of a pandas dataframe
in Python.
How do I apply function to column in pandas?
Here are multiple ways to apply function to column in Pandas.
Using dataframe.apply() function
The dataframe.apply()
function is simply utilized to apply any specified function across an axis on the given pandas DataFrame
.
The syntax for the dataframe.apply()
function is:
1 2 3 |
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs) |
The dataframe.apply()
takes in a couple of parameters, all of which are mentioned below:
- func: It specifies the function that needs to be applied.
- axis: It specifies the axis along with which the function needs to be implemented. The value
0
denotescolumn
while1
denotesrow
. By default, its value is taken as0
.
These two parameters are essential in order to understand the functioning and implementation of this method. Further information on the other optional parameters that the function takes in can be accessed here.
The following code uses the dataframe.apply()
function to apply a function to a specific column in pandas
1 2 3 4 5 6 7 8 9 |
import pandas as pd import numpy as np dfa = pd.DataFrame([[3,3,3], [4,4,4], [5,5,5]], columns=['X','Y','Z']) def a_2(x): return x+2 dfa['Y'] = dfa['Y'].apply(a_2) print (dfa) |
The above code provides the following output:
0 3 5 3
1 4 6 4
2 5 7 5
Explanation:
- The
numpy
andpandas
libraries are imported to the code first. - A pandas
DataFrame
nameddfa
is created and is initialized. - A function that is to be implemented to the column of the
DataFrame
is created. - The function is then implemented to the second column
Y
of the givenDataFrame
using theapply()
function.
Using lambda
function along with the apply()
function
A lambda
function is an unnamed function that represents only a single expression while taking any number of arguments in it.
The lambda
function can be tweaked into the apply()
function to apply a function to a specific column. This method shortens the length of the code as compared to the method above.
The following code uses the lambda
function along with the apply()
function.
1 2 3 4 5 6 7 |
import pandas as pd import numpy as np dfa = pd.DataFrame([[3,3,3], [4,4,4], [5,5,5]], columns=['X','Y','Z']) dfa['Y'] = dfa['Y'].apply(lambda x: x+2) print (dfa) |
The above code provides the following output:
0 3 5 3
1 4 6 4
2 5 7 5
The working of this method is similar to the simple dataframe.apply()
method mentioned above, with the only difference being that we do not have to specifically create a function and can simply use a lambda
function directly in the apply()
function.
Using dataframe.transform()
function
The dataframe.transform()
function is utilized in calling a given function func
on self and creating a DataFrame
that contains all the transformed values, provided the length of the transformed DataFrame
is the same as that of the initial value.
The syntax for the dataframe.transform()
function is:
1 2 3 |
DataFrame.transform(func, axis=0, *args, **kwargs) |
Similar to the apply()
function, the transform()
function contains several parameters:
- func: It specifies the function that needs to be applied.
- axis: It specifies the axis along with which the function needs to be implemented. The value
0
denotescolumn
while1
denotesrow
. By default, its value is taken as0
.
The dataframe.transform()
takes in several other parameters which do not need to be explained to implement this method simply. However, more details on all the parameters of the transform()
function can be found here.
The following code uses the dataframe.transform()
function to apply a function to a specific column in pandas.
1 2 3 4 5 6 7 8 9 |
import pandas as pd import numpy as np dfa = pd.DataFrame([[3,3,3], [4,4,4], [5,5,5]], columns=['X','Y','Z']) def a_2(x): return x+2 dfa['Y'] = dfa['Y'].transform(a_2) print (dfa) |
The above code provides the following output:
0 3 5 3
1 4 6 4
2 5 7 5
Explanation:
- The
numpy
andpandas
libraries are imported to the code first. - A pandas
DataFrame
nameddfa
is created and is initialized. - A function that is to be implemented to the column of the
DataFrame
is created. - The function is then implemented to the second column
Y
of the givenDataFrame
using thedataframe.transform()
function.
Further reading:
Using map()
function
The map()
function which is provided by Python is utilized to seek a particular function to all the elements in any given iterable. It returns the iterator itself as the result.
The map()
function can be utilized in place of the apply function.
The following code uses the map()
function to apply a function to a specific column in pandas.
1 2 3 4 5 6 7 |
import pandas as pd import numpy as np dfa = pd.DataFrame([[3,3,3], [4,4,4], [5,5,5]], columns=['X','Y','Z']) dfa['Y'] = dfa['Y'].map(lambda x: x+2) print (dfa) |
The above code provides the following output:
0 3 5 3
1 4 6 4
2 5 7 5
Explanation:
- The
numpy
andpandas
libraries are imported to the code first. - A pandas
DataFrame
nameddfa
is created and is initialized. - A
lambda
function is utilized in this case to specify the changes - This
lambda
function is then implemented to the second columnY
of the givenDataFrame
using themap()
function.
Using NumPy.square()
function
NumPy
is an abbreviation for Numerical Python
and is a library that Python provides which is utilized in dealing with and manipulating arrays and tabular data. The numpy.square()
function is a simple mathematical function that returns another array with the selected column values as the square of the original values.
The following code uses the numpy.square()
function to apply a function to a specific column in pandas.
1 2 3 4 5 6 7 |
import pandas as pd import numpy as np dfa = pd.DataFrame([[3,3,3], [4,4,4], [5,5,5]], columns=['X','Y','Z']) dfa['Y'] = np.square(dfa['Y']) print (dfa) |
The above code provides the following output:
0 3 9 3
1 4 16 4
2 5 25 5
Explanation:
- The
numpy
andpandas
libraries are imported to the code first. - A pandas
DataFrame
nameddfa
is created and is initialized. - The
numpy.square()
function is then implemented to the second columnY
of the givenDataFrame
in Python.
We should note that thenumpy.square()
function can only change the elements and implement a square value of the set of elements of a column and cannot apply any other function to it.