Pandas convert column to int

Pandas convert column to int

Pandas is a library set up on top of the Python programming language and is mostly used for the purpose of Data Analysis and Machine learning.

Pandas DataFrame can be defined as two-dimensional data structures that have columns of possibly different types.

In this article, we will also need to use Pandas Series. Series can be defined as one-dimensional arrays that are capable of holding the elements in any datatype. It can be said that it is similar to an individual column of a DataFrame.

This tutorial will discuss different methods to convert columns in Pandas to int in Python.

This article will use both Pandas Series and Pandas DataFrame at different points. To implement all the methods in this article, we will have to import the Pandas package.

Use the to_numeric() function to convert column to int

The simplest and the most basic way to convert the elements in a Pandas Series or DataFrame to int.

The to_numeric() function is used to change one or more columns in a Pandas DataFrame into a numeric object. This function converts the non-numeric values into floating-point or integer values depending on the need of the code.

The following code uses the to_numeric() function to convert columns in Pandas Series to int in Python.

Output:
0 3
1 5
2 8
3 4
4 9
dtype: object
0 3
1 5
2 8
3 4
4 9
dtype: int64

Explanation

  • The Pandas library is imported.
  • A Series is created using the pd.Series() function.
  • The to_numeric() function is used to convert the string values of the Series into appropriate integer values.

If you use floating numbers rather than int then column will be converted to float.

Output:
0 3.5
1 5.2
2 8
3 4.2
4 9
dtype: object
0 3.5
1 5.2
2 8.0
3 4.2
4 9.0
dtype: float64

Sometimes, the columns might contain values that cannot be converted into int or float values. To overcome this complication, the to_numeric() function takes a parameter for errors. We can choose to raise an error, force the values in the column to be NaN, or casually ignore these columns that contain an inconvertible value.

The to_numeric() function can be used to convert multiple columns of a DataFrame as well as using the apply() method.

The following code implements the to_numeric() function to convert the datatype of all the columns to int.

Output:

In the above code, the apply() function is used along with the to_numeric() function on the given DataFrame to convert multiple columns into int at once.

Use the astype() function to convert column to int

The astype() function tries to convert any given data type to any other specified data type, even if it is not essentially feasible to do so. It can also be used to convert the data into categorical data type, which is very useful in the case of Data Analytics and Statistics.

The following code uses the astype() function to convert columns in Pandas Series to int in Python.

Output:

Explanation

  • A Series is created using the pd.Series() function, after importing the Pandas library
  • The astype() function is used to convert all the elements to int
  • The code will normally generate an error, as all the values cannot possibly be converted to int here.
  • To overcome that, we use the error parameter of astype() function and set it to ignore. This will ignore the values and generate an output. The same conversion with float as the result would work without any errors.

The astype() function can also be used for DataFrames. The following code implements the astype() function to convert columns in Pandas DataFrame to int in Python.

Output:

Explanation

  • A DataFrame is generated with two columns of the type object. In this DataFrame, one column holds the values as integers, while the other column contains string values that represent integers.
  • The dtypes function is used to return the datatypes of all the columns in the DataFrame.
  • Then, the astype() function is used on both the columns, converting column x to integer and column y to a complex number.

Note that astype() will also generate an error when it encounters a value that it cannot convert to the specified datatype.

The astype() function is a very powerful function, but it may sometimes convert the values incorrectly in some exceptional cases. It is better to use this function for simple conversions.

Use the infer_objects() function to convert column to int

This method was introduced in Version 0.21.0 of pandas and is used for soft conversion. This means that it can be utilized for the conversion process of columns of the DataFrame having an object datatype to a given specific datatype.

The following code uses the infer_objects() function to convert columns in Pandas DataFrame to int in Python.

Output:

Explanation

  • First, A DataFrame is generated with two columns of the type object. In this DataFrame, one column holds the values as integers, while the other column contains string values that represent integers.
  • The dtypes function is used to return the datatypes of all the columns in the DataFrame.
  • Then, the infer_objects() function is used on the given DataFrame that is df.
  • The infer_objects() command converts the type of the first column to an integer as the first column holds integer values.

In the above program, No changes are made to the second column y since it contained string values. This method is not made for forcing the conversions and is used only for the actual and feasible conversions.

Use the convert_dtypes() function to convert column to int

The convert_dtypes() function was introduced in Pandas version 1.0 and is still available to use in the current versions. This function is utilized to appropriately convert the given Series and DataFrame columns to a more suitable datatype that will support the pd.NA missing value.

The following code uses the convert_dtypes() function to convert columns in Pandas DataFrame to int in Python.

Output:

Explanation

  • Firstly, A DataFrame is generated with two columns of the type object. In this DataFrame, one column holds the values as integers, while the other column contains string values that represent integers.
  • The convert_dtypes() function is implemented and it automatically converts the data type of both the columns into more suitable ones.
  • Column x contained integer values, therefore, it is converted to int64 type while column y contained string values and is converted into string type.

That’s all about Pandas convert column to int.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *