Iterate through files in directory in python

Iterate through files in directory in Python

A directory, as its name implies, is a place where sub-directories are created and files are stored. A directory with no parent is referred to as the root directory which is usually the main directory.

This tutorial demonstrates the different ways available to iterate through files in directory in Python.

Python provides the os module that is readily available to use and is utilized in working on directories. This module provides quite a selection of functions to choose from which help in creating, deleting, and iterating through a directory in a very simplistic manner. The os module will be significantly used in some of the methods mentioned used and will be further explained in this article.

Using the os.scandir() function to iterate through files in a given directory in Python.

In simple terms, the OS module is one of the standard utility modules in Python and is utilized in providing functions that interact with the operating system.

The os.scandir() function is utilized to scan all the files stored in a particular directory. This function lists only the directories or files that come under a particular selected path to the directory and does not list the recursive directories or files.

The os.scandir() function was introduced in Python 3.5, and is available to use in all the versions of Python that came later. The os module needs to be imported to the Python code in order to successfully use this method without any errors on runtime.

The following code uses the os.scandir() function to iterate through files in a given directory in Python.

The above code provides the following output:

val1.pdf
val2.pdf
val3.pdf
Explanation
  • The os module is imported to the Python code.
  • The dir variable is assigned a path to the given directory.
  • Then, the os.scandir() function is utilized in order to get the iterator of the os.DirEntry correlating to all the entries in the given directory stated by the given path.
  • The if statement is utilized to check whether the specified path contains the directory or not.

Using the os.listdir() function to iterate through files in a given directory in Python.

Another function that is provided by the os module is the os.listdir() function. The working of this function is similar to the os.scandir() function mentioned above. The os.listdir() function is also utilized to list all the files that are present directly under a given directory. This function does not support recursive listing of the files and directories.

This function was not introduced in Python 3.5, it has been there long before and is usable on previous versions of Python as well. The os module needs to be imported to the working code in order to run the code without any runtime errors.

The following code uses the os.listdir() function to iterate through files in a given directory in Python.

The above code provides the following output:

C:Users\vaib\val1.pdf
C:Users\vaib\val2.pdf
C:Users\vaib\val3.pdf
Explanation
  • The os module is imported to the Python code.
  • The dir variable is assigned a path to the given directory.
  • Then, a simple for loop is designed and utilized to iterate multiple times through the given path to get the list of all files stored in it.
  • The os.listdir() function is utilized to get the list of files in the specified directory.
  • The if statement is utilized to check whether the specified path contains the directory or not.

Both the os.scandir() and the os.listdir() functions lack the approach of recursive listing of files in a given directory. However, the three methods mentioned below can be used for recursive listing the files in a given directory.

Using the os.walk() function to iterate through files in a given directory in Python.

The os module provides another way of iterating through files in a given directory in Python. This can be done by implementing the os.walk() function.

The os.walk() function generates the list of all the names of the files present in the directory tree with either the bottom-to-up or top-to-bottom approach. This function yields 3 tuples, namely dirpath, dirnames, filenames.

To simplify the understanding of the 3 tuples, they are explained below:

  • root: It displays directories only from the specified path.
  • dirs: It displays the sub-directories all the way from the root.
  • files: It displays all the files from the root and the specified directory.

The os module needs to be imported to the working code in order to run the code without any runtime errors.

The following code uses the os.walk() function to iterate through files in a given directory in Python.

The above code provides the following output:

C:Users\vaib\val1.pdf
C:Users\vaib\val2.pdf
C:Users\vaib\val3.pdf
Explanation
  • The os module is imported to the Python code.
  • The dir variable is assigned a path to the given directory.
  • Then, a simple for loop is designed and utilized to iterate multiple times through the given path to get the list of all files stored in it.
  • The os.walk function provides a 3 tuple result, out of which, the files tuple is utilized further in the code.
  • The if statement is utilized to check whether the specified path contains the directory or not.

Using the pathlib module to iterate through files in a given directory in Python.

The pathlib module can be defined as an object-oriented medium for interacting with the filesystem and provides classes for dealing with the filesystem in a pythonic but not a platform-oriented manner.

Recursive iteration of files in the given directory is also possible with the help of the pathlib module. We will be using the Path.glob() function from the pathlib module which is utilized to iterate over all the files given in a specified directory, to successfully implement this method.

The pathlib module was first introduced in Python 3.4 and is available to use in all the versions released after that. Like the OS module, the pathlib module is one of the standard utility modules provided in Python. The pathlib module needs to be imported to the working code to run the program without any errors.

The following code uses the pathlib module to iterate through files in a given directory in Python.

The above code provides the following output:

C:Users\vaib\val1.pdf
C:Users\vaib\val2.pdf
C:Users\vaib\val3.pdf
Explanation
  • The Path sub-module is imported from the pathlib module to the Python code.
  • The dir variable is assigned a path to the given directory.
  • The path.glob(*) function is utilized to yield and print all the files existing in the specified directory.
  • Then, we use for loop to iterate through all the files in the specified directory.

Using the glob module to iterate through files in a given directory in Python.

The glob module provides functions that help in returning and working on the files and directories that exist in a specified directory. We will use the iglob() function here to successfully implement this method in Python.

This function also supports the recursive iteration over the files given in a specified directory. The glob module needs to be imported to the working code in order to run the program without any errors.

The following code uses the glob module to iterate through files in a given directory in Python.

The above code provides the following output:

C:Users\vaib\val1.pdf
C:Users\vaib\val2.pdf
C:Users\vaib\val3.pdf
Explanation
  • The glob module is imported to the Python code.
  • The dir variable is assigned a path to the given directory.
  • Then, a simple for loop is designed and utilized to iterate multiple times through the given path to get the list of all files stored in it.
  • This iteration process can be recursive and is carried out by the glob.iglob() function in this method.

That’s all about how to iterate through files in directory in python.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *