Split dataframe in Pandas

Split dataframe in Pandas

In real-life scenarios, we deal with massive datasets with many rows and columns. At times, we may want to split a large DataFrame into smaller DataFrames.

We will discuss different methods to split dataframe in Python.

Using the iloc() function to split DataFrame in Python

Slicing is a method of extracting a smaller number of elements from a larger structure. We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.

By Rows

We can select the required range of rows from the DataFrame using the iloc() function.

See the following example.

Output:

In the above example, we split the DataFrame based on rows. It is useful to know the total rows and columns in the DataFrame while using this method.

By Columns

We can similarly split the DataFrame based on columns also. We can specify the range for the columns.

For example,

Output:

Using the sample() function to split DataFrame in Python

The sample() function returns a random sample of values from a DataFrame. We can extract elements from the required axis. The ratio of the sample can be specified in the function.

For example,

Output:

The ratio specified in the above example is 0.75. We can use other parameters like random_state and weights to have some control on the final result.

This method is highly used while dividing the DataFrame into test and train datasets in machine learning.

Using the groupby() function to split DataFrame in Python

The groupby() function is used to split the DataFrame based on some values. We can first split the DataFrame and extract specific groups using the get_group() function.

This method works best when we want to split a DataFrame based on some column that has categorical values.

For example,

Output:

In the above example, we grouped the DataFrame using the Gender column and extracted the rows where the value for this column is F.

Using the columns to split DataFrame in Python

We can specify the labels or index of the required columns in a list to extract those columns from the DataFrame.

For example,

Output:

That’s all about how to split dataframe in Pandas.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *