Table of Contents
1. Introduction
Pandas is python library for data analysis and manipulation. One of the common tasks is to convert data type of column from object to float. We can achieve this using astype()
or to_numeric()
methods.
Method 1: Using astype()
123 df['column_name'] = df['column_name'].astype(float)Method 2: Using to_numeric()
123 df['column_name'] = pd.to_numeric(df['column_name'])
Let’s see each method in detail with examples.
2. Using astype() Method
Use the astype()
method to convert one DataFrame column from object to float in pandas.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("Before changing data type:",df.dtypes,sep='\n') df['marks'] = df['marks'].astype(float) print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Before changing data type: students object roll_number object marks object dtype: object After changing data type: students object roll_number object marks float64 dtype: object |
Note that marks column has data type of float64
.
Use the astype()
method to convert multiple columns of DataFrame from object to float in pandas.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("Before changing data type:",df.dtypes,sep='\n') df[['marks','roll_number']] = df[['marks', 'roll_number']].astype(float) #alternatively, we can do as follows #df= df.astype({'roll_number': float, 'marks': float}) print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Before changing data type: students object roll_number object marks object dtype: object After changing data type: students object roll_number float64 marks float64 dtype: object |
Use the astype()
method to convert the entire DataFrame from object to float in pandas.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'group_number': ['1', '2', '3', '4'], 'students_per_group': ['5', '8', '7', '10'], 'score': ['60', '70.2', '75.1', '80'] }) print("DataFramBefore changing data type:",df.dtypes,sep='\n') df = df.astype(float) print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DataFramBefore changing data type: group_number object students_per_group object score object dtype: object After changing data type: group_number float64 students_per_group float64 score float64 dtype: object |
We used the astype()
method to convert one column, multiple columns and the entire DataFrame’s dtypes from object to float. This method took a float
dtype as a parameter to convert to float. Using astype()
depends on the use case, whether you are using it to convert one column, multiple columns or an entire DataFrame; you can refer to the above examples for all these scenarios.
Now, think of a situation where we are supposed to convert the whole DataFrame from object to float while one or multiple columns of this DataFrame are not convertible. Will the astype()
method still work? Let’s see the following example.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("DataFramBefore changing data type:",df.dtypes,sep='\n') df = df.astype(float) print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 |
ValueError: could not convert string to float: 'John' |
In the above example, we used the astype()
method to convert the entire DataFrame from object to float where all columns are not convertible. So, we will get a ValueError
as demonstrated above. Now, how can we handle it?
Yes, we can use the errors
attribute set to ignore
to leave the entire DataFrame as it is if any of its columns is not convertible from object to float. See the following example.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("DataFramBefore changing data type:",df.dtypes,sep='\n') df = df.astype(float, errors='ignore') print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DataFramBefore changing data type: students object roll_number object marks object dtype: object After changing data type: students object roll_number object marks object dtype: object |
But, if you still want to convert the convertible columns and leave those that aren’t, we use the replace()
method with astype()
as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("Before changing data type:",df.dtypes,sep='\n') df = df.replace('[^0-9\.]+', '', regex=True).astype(float, errors='ignore') print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Before changing data type: students object roll_number object marks object dtype: object After changing data type: students object roll_number float64 marks float64 dtype: object |
We successfully converted the roll_number
and marks
columns from object to float but left the students
column as it is because the strings are not convertible to float.
We can also use the
replace()
method withastype()
to convert one or multiple columns of a DataFrame from object to float.
3. Using to_numeric() Method
Use the to_numeric()
method to convert one DataFrame column from object to float in pandas.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("Before changing data type:",df.dtypes,sep='\n') df['marks'] = pd.to_numeric(df['marks']) print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Before changing data type: students object roll_number object marks object dtype: object After changing data type: students object roll_number object marks float64 dtype: object |
Use the to_numeric()
method to convert multiple DataFrame columns from object to float in pandas.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("Before changing data type:",df.dtypes,sep='\n') df[['marks','roll_number']] = df[['marks', 'roll_number']].apply(pd.to_numeric) print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Before changing data type: students object roll_number object marks object dtype: object After changing data type: students object roll_number int64 marks float64 dtype: object |
Use the to_numeric()
method to convert the entire pandas DataFrame from object to float where all columns are convertible.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd group_df = pd.DataFrame({ 'group_number': ['1', '2', '3', '4'], 'students_per_group': ['5', '8', '7', '10'], 'score': ['60', '70.2', '75.1', '80'] }) print("DataFramBefore changing data type:",group_df.dtypes,sep='\n') group_df = group_df.apply(pd.to_numeric) print("\nAfter changing data type:",group_df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DataFramBefore changing data type: group_number object students_per_group object score object dtype: object After changing data type: group_number int64 students_per_group int64 score float64 dtype: object |
The above examples are similar to the code fences learned in the previous section and have the same flow, but we used the pd.to_numeric
method this time.
This method converts the object to a numeric data type; it can be a float or integer, depending on the specified value. For example, pd.to_numeric
will convert '5'
to 5
while '5.0'
to 5.0
. So, we used the apply()
method to apply the pd.to_numeric
function to convert multiple columns or complete DataFrame from object to float.
Converting the entire DataFrame, where all columns can not be converted from object to float, will result in the ValueError
; see the following example.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("DataFramBefore changing data type:",df.dtypes,sep='\n') df = df.apply(pd.to_numeric) print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 |
ValueError: Unable to parse string "John" at position 0 |
We can use the errors = 'ignore'
in the apply()
method, as shown below, to ignore columns that are not convertible. For example, see the following code snippet.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', '75.1', '80'] }) print("DataFramBefore changing data type:",df.dtypes,sep='\n') df = df.apply(pd.to_numeric, errors='ignore') print("\nAfter changing data type:",df.dtypes,sep='\n') |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DataFramBefore changing data type: students object roll_number object marks object dtype: object After changing data type: students object roll_number int64 marks float64 dtype: object |
See, this time, the students
column is not converted because it is not convertible, and we have ignored it using the errors
attribute in the apply()
method.
We can also use the
apply()
method with thelambda
function asdf['df_column'] = df['df_column'].apply(lambda x: float(x))
to convert one DataFrame column from object to float, but this approach will not work to convert multiple DataFrame or entire DataFrame; for that, we can usepd.to_numeric()
, which we have learned already.
4. Replace invalid values with Nan while conversion
In case, we have any invalid values in column and cannot be converted to float, we can use errors
parameter with value as coerce
, it will convert invalid value to NaN
.
1 2 3 4 5 6 7 8 9 10 11 12 |
import pandas as pd df = pd.DataFrame({ 'students': ['John', 'Mary', 'Martin', 'Sam'], 'roll_number': ['5', '8', '7', '10'], 'marks': ['60', '70.2', 'AB', '80'] }) df['marks'] = pd.to_numeric(df['marks'],errors='coerce') print(df) |
1 2 3 4 5 6 7 |
students ... marks 0 John ... 60.0 1 Mary ... 70.2 2 Martin ... **NaN** 3 Sam ... 80.0 |
As you can see, AB
value in marks column has been converted to NaN
.
5. Conclusion
In this article, we explored how to convert object to float in Pandas using astype()
and to_numeric()
methods.
We can use errors
parameter with value as ignore
with both methods if some of columns are not convertible to float and we are still trying to convert complete dataframe or non convertible datatype columns to float.