Remove Urls from Text in Python

Remove Urls from String in Python

In this post, we will see how to remove Urls from text in Python.

Introduction

In Python, we can read and process text data. We can perform various operations on such texts using different libraries. In this tutorial, we will learn how to remove URLs from text in Python.

A URL is a link for any given resource on the internet. A URL is unique for every resource but they all follow the same structure. A URL will be different in every text and a given text may contain so we need to first identify the URL from its format and remove it.

For this, we can use Regular Expressions (regex). Regex is a technique that can create patterns that can identify some substring from a string. Since every URL shares the same structure, we can create a regex pattern that can identify the URL from a string.

We have to use the re module to work with regular expressions in Python.

Ways to remove URLs from Text in Python

This tutorial will demonstrate different methods from the re module that can be used to remove URLs from text in Python.

Using the re.sub() function to remove URLs from Text in Python

The re.sub() function provides the most straightforward approach to remove URLs from text in Python.

This function is used to substitute a given substring with another substring in any provided string. It uses a regex pattern to find the substring and then replace it with the provided substring.

To remove URLs from text in Python we can use this function with many patterns. We will demonstrate different possible regex patterns that can identify the URLs in our example.

See the code below.

Output:

This is a text with a URL to remove.
This is a text with a URL to remove.
This is a text with a URL to remove.

In the above example, we used three patterns to detect and remove URLs from text in Python. One can use whatever pattern works for their code. For our example, all three work. We will use only one pattern in the following examples.

Using the re.findall() function to remove URLs from Text in Python

The re.findall() function is used to find the total occurrences of a substring in a given string based on a regex pattern. It returns a list of all the occurrences of the substring.

We can use this function to find the URLs in a given string and then remove them using the replace() function. With the replace() function, we will replace the occurrence of the given URL with an empty string.

See the code below.

Output:

This is a text with a URL to remove.

Using the re.search() function to remove URLs from Text in Python

We can also use the re.match() and re.search() function to find a substring based on the regex pattern. However, both these functions only return the first occurrence of the substring. So, if a string contains more than one URL, these methods will fail.

Another downside of the re.match() function is that it only searches the first line of the string. So, if we have a string with only one URL, we can use the re.search() function.

The matched substring is returned in a match object.

See the code below.

Output:

This is a text with a URL to remove.

Using the urllib.urlparse class to remove URLs from Text in Python

In Python, we can send requests to a given address using modules like urllib, requests, and more. With the urllib.urlparse class, we can parse URLs and break them into components.

The urllib.parse object parses a URL string. We can use the scheme attribute of this object to check whether a string matches the structure of a URL or not.

To remove URLs from text in Python with this method, we will first break the text into a list of strings. This can be achieved using the split() function that can split strings into a list of strings based on some character.

We will then use the scheme attribute to check if each string in the list matches a URL or not. If the match is True, we will ignore that string. Finally, we will combine the remaining elements of the list using the join() function.

See this logic implemented below.

Output:

This is a text with a URL to remove.

This is the only method that does not use any regex.

Conclusion

To conclude, we discussed several methods to remove URLs from text in Python. Most of the methods used regular expressions to detect and replace the URL from a string with an empty string. The final method involves the urllib.urlparse module does not use regex and uses other functions within.

Was this post helpful?


import_contacts

You may also like:

Related Posts

  • 29 November

    Prefix b Before String in Python

    Table of ContentsPrefix b Before String in PythonConclusion Prefix b Before String in Python Prefix b before String denotes a byte String. By putting b before String, you can convert String to bytes in Python. The upgrade from Python 2 to Python 3 was considered a major change as many new features were introduced and […]

  • 28 November

    Get String Between Two Characters in Python

    Table of ContentsUsing the string-slicing techniqueUsing Regular ExpressionsUsing the split() functionConclusion Using the string-slicing technique To get String between two characters in Python: Use String’s find() method to find indices of both the characters. Use String’s slicing to get String between indices of the two characters. [crayon-638b46bc6e03f741382583/] Output: ava2Blo In the above example, we found […]

  • 25 November

    Print String Till Character in Python

    Table of ContentsPrint String Till Character in PythonUsing the for loop and break statementUsing the slicing techniqueUsing the split() functionUsing the partition() functionConclusion A string can be termed as a collection of characters with each character occupying a particular position. Strings cannot be altered and are considered immutable in Python. Print String Till Character in […]

  • 25 November

    Remove NewLine from String in Python

    Table of ContentsRemove NewLine from String in PythonUse str.replace() to Remove Newline Characters From a Python StringUse str.strip() to Remove Newline Characters From a Python StringUse str.splitlines() to Remove Newline Characters From a Python StringUse re.sub() to Remove Newline Characters From a Python StringRemove Newline from a List of Strings in Python 💡 TL;DR You […]

  • 24 November

    Print Variable and String in Same Line in Python

    Table of ContentsPrint Variable and String in Same Line in PythonUsing a commaUsing the % formattingUsing the format() functionUsing the fstringsUsing the + operator and str() functionConclusion 💡 TL;DR To Print Variable and String in Same Line in Python, use print() method and separate String and variable with comma. [crayon-638b46bc6eea3547604032/] Print Variable and String in […]

  • 16 November

    Print None as Empty String in Python

    Table of ContentsHow To Print None as An Empty String in Python?Using the Boolean OR operator.Using a lambda function along with the Boolean OR operator.Using an if statement.Conclusion. The term None in python is not the same as an empty string or 0. None is a whole data type in itself that is provided by […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our newletter

Get quality tutorials to your inbox. Subscribe now.