Split dataframe in Pandas

Split dataframe in Pandas

In real-life scenarios, we deal with massive datasets with many rows and columns. At times, we may want to split a large DataFrame into smaller DataFrames.

We will discuss different methods to split dataframe in Python.

Using the iloc() function to split DataFrame in Python

Slicing is a method of extracting a smaller number of elements from a larger structure. We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.

By Rows

We can select the required range of rows from the DataFrame using the iloc() function.

See the following example.

Output:

In the above example, we split the DataFrame based on rows. It is useful to know the total rows and columns in the DataFrame while using this method.

By Columns

We can similarly split the DataFrame based on columns also. We can specify the range for the columns.

For example,

Output:

Using the sample() function to split DataFrame in Python

The sample() function returns a random sample of values from a DataFrame. We can extract elements from the required axis. The ratio of the sample can be specified in the function.

For example,

Output:

The ratio specified in the above example is 0.75. We can use other parameters like random_state and weights to have some control on the final result.

This method is highly used while dividing the DataFrame into test and train datasets in machine learning.

Using the groupby() function to split DataFrame in Python

The groupby() function is used to split the DataFrame based on some values. We can first split the DataFrame and extract specific groups using the get_group() function.

This method works best when we want to split a DataFrame based on some column that has categorical values.

For example,

Output:

In the above example, we grouped the DataFrame using the Gender column and extracted the rows where the value for this column is F.

Using the columns to split DataFrame in Python

We can specify the labels or index of the required columns in a list to extract those columns from the DataFrame.

For example,

Output:

That’s all about how to split dataframe in Pandas.

Related Posts

  • Select rows by multiple conditions using loc in Pandas
    29 July

    Select rows by multiple conditions using loc in Pandas

    The loc() function in a pandas module is used to access values from a DataFrame based on some labels. It returns the rows and columns which match the labels. We can use this function to extract rows from a DataFrame based on some conditions also. First, let us understand what happens when we provide a […]

  • Read text file in Pandas
    28 July

    Read text file in Pandas

    Table of ContentsUsing the read_csv() function to read text files in PandasUsing the read_table() function to read text files in PandasUsing the read_fwf() function to read text files in Pandas A dataset has the data neatly arranged in rows and columns. The pandas module in Python allows us to load DataFrames from external files and […]

  • Copy DataFrame in Python
    10 July

    Copy DataFrame in Pandas

    This articles provide different ways to copy DataFrame in Pandas.

  • Pandas convert column to int
    18 June

    Pandas convert column to int

    Table of ContentsUse the to_numeric() function to convert column to intUse the astype() function to convert column to intUse the infer_objects() function to convert column to intUse the convert_dtypes() function to convert column to int Pandas is a library set up on top of the Python programming language and is mostly used for the purpose […]

  • 20 September

    Reorder the columns of pandas dataframe in Python

    Table of ContentsUsing reindex methodUsing column selection through column nameUsing column selection through column index In this post, we will see 3 different methods to Reordering the columns of Pandas Dataframe : Using reindex method You can use DataFrame’s reindex() method to reorder columns of pandas DataFrame. You need to pass columns=[$list_of_columns] to reindex() method […]

  • 08 September

    Pandas create Dataframe from Dictionary

    Table of ContentsUsing a Dataframe() method of pandas.Using DataFrame.from_dict() method. In this tutorial, We will see different ways of Creating a pandas Dataframe from Dictionary . Using a Dataframe() method of pandas. Example 1 : When we only pass a dictionary in DataFrame() method then it shows columns according to ascending order of their names […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our newletter

Get quality tutorials to your inbox. Subscribe now.