Split dataframe in Pandas

Table of Contents

Using the iloc() function to split DataFrame in Python
- By Rows
- By Columns
Using the sample() function to split DataFrame in Python
Using the groupby() function to split DataFrame in Python
Using the columns to split DataFrame in Python

In real-life scenarios, we deal with massive datasets with many rows and columns. At times, we may want to split a large DataFrame into smaller DataFrames.

We will discuss different methods to split dataframe in Python.

Using the `iloc()` function to split DataFrame in Python

Slicing is a method of extracting a smaller number of elements from a larger structure. We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.

By Rows

We can select the required range of rows from the DataFrame using the iloc() function.

See the following example.


import pandas as pd
df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],
                   ['Preity','F',19],['Neil','M',17]], 
                  columns = ['Name','Gender','Age'])
df1 = df.iloc[2:,:]
df2 = df.iloc[:2,:]
print(df1)
print(df2)

import pandas as pd

df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],

['Preity','F',19],['Neil','M',17]],

columns = ['Name','Gender','Age'])

df1 = df.iloc[2:,:]

df2 = df.iloc[:2,:]

print(df1)

print(df2)

Output:


     Name Gender  Age
2  Preity      F   19
3    Neil      M   17
       Name Gender  Age
0       Jay      M   18
1  Jennifer      F   17

Name Gender Age

2 Preity F 19

3 Neil M 17

Name Gender Age

0 Jay M 18

1 Jennifer F 17

In the above example, we split the DataFrame based on rows. It is useful to know the total rows and columns in the DataFrame while using this method.

By Columns

We can similarly split the DataFrame based on columns also. We can specify the range for the columns.

For example,


import pandas as pd
df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],
                   ['Preity','F',19],['Neil','M',17]], 
                  columns = ['Name','Gender','Age'])
df1 = df.iloc[:,:2]
df2 = df.iloc[:,2:]
print(df1)
print(df2)

import pandas as pd

df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],

['Preity','F',19],['Neil','M',17]],

columns = ['Name','Gender','Age'])

df1 = df.iloc[:,:2]

df2 = df.iloc[:,2:]

print(df1)

print(df2)

Output:


       Name Gender
0       Jay      M
1  Jennifer      F
2    Preity      F
3      Neil      M
   Age
0   18
1   17
2   19
3   17

Name Gender

0 Jay M

1 Jennifer F

2 Preity F

3 Neil M

Age

0 18

1 17

2 19

3 17

Using the `sample()` function to split DataFrame in Python

The sample() function returns a random sample of values from a DataFrame. We can extract elements from the required axis. The ratio of the sample can be specified in the function.

For example,


import pandas as pd
df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],
                   ['Preity','F',19],['Neil','M',17]], 
                  columns = ['Name','Gender','Age'])
df1 = df.sample(frac = 0.75)
print(df1)

import pandas as pd

df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],

['Preity','F',19],['Neil','M',17]],

columns = ['Name','Gender','Age'])

df1 = df.sample(frac = 0.75)

print(df1)

Output:


     Name Gender  Age
0     Jay      M   18
2  Preity      F   19
3    Neil      M   17

Name Gender Age

0 Jay M 18

2 Preity F 19

3 Neil M 17

The ratio specified in the above example is 0.75. We can use other parameters like random_state and weights to have some control on the final result.

This method is highly used while dividing the DataFrame into test and train datasets in machine learning.

Using the `groupby()` function to split DataFrame in Python

The groupby() function is used to split the DataFrame based on some values. We can first split the DataFrame and extract specific groups using the get_group() function.

This method works best when we want to split a DataFrame based on some column that has categorical values.

For example,


import pandas as pd
df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],
                   ['Preity','F',19],['Neil','M',17]], 
                  columns = ['Name','Gender','Age'])
gr = df.groupby('Gender')
print(gr.get_group('F'))

import pandas as pd

df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],

['Preity','F',19],['Neil','M',17]],

columns = ['Name','Gender','Age'])

gr = df.groupby('Gender')

print(gr.get_group('F'))

Output:


       Name Gender  Age
1  Jennifer      F   17
2    Preity      F   19

Name Gender Age

1 Jennifer F 17

2 Preity F 19

In the above example, we grouped the DataFrame using the Gender column and extracted the rows where the value for this column is F.

Using the columns to split DataFrame in Python

We can specify the labels or index of the required columns in a list to extract those columns from the DataFrame.

For example,


import pandas as pd
df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],
                   ['Preity','F',19],['Neil','M',17]], 
                  columns = ['Name','Gender','Age'])
df1 = df[['Name','Gender']]
print(df1)

import pandas as pd

df = pd.DataFrame([['Jay','M',18],['Jennifer','F',17],

['Preity','F',19],['Neil','M',17]],

columns = ['Name','Gender','Age'])

df1 = df[['Name','Gender']]

print(df1)

Output:


       Name Gender
0       Jay      M
1  Jennifer      F
2    Preity      F
3      Neil      M

Name Gender

0 Jay M

1 Jennifer F

2 Preity F

3 Neil M

That’s all about how to split dataframe in Pandas.

Was this post helpful?

Let us know if this post was helpful. Feedbacks are monitored on daily basis. Please do provide feedback as that\'s the only way to improve.

Split dataframe in Pandas

Using the `iloc()` function to split DataFrame in Python

By Rows

By Columns

Using the `sample()` function to split DataFrame in Python

Using the `groupby()` function to split DataFrame in Python

Using the columns to split DataFrame in Python

Was this post helpful?

Author

Leave a Reply Cancel reply

Categories

Popular Posts

Let’s be Friends

Using the iloc() function to split DataFrame in Python

By Rows

By Columns

Using the sample() function to split DataFrame in Python

Using the groupby() function to split DataFrame in Python

Using the columns to split DataFrame in Python

Was this post helpful?

Related posts:

Share this

Author

Leave a Reply Cancel reply

Let’s be Friends

Using the `iloc()` function to split DataFrame in Python

Using the `sample()` function to split DataFrame in Python

Using the `groupby()` function to split DataFrame in Python