Remove Urls from Text in Python

Remove Urls from String in Python

In this post, we will see how to remove Urls from text in Python.

Introduction

In Python, we can read and process text data. We can perform various operations on such texts using different libraries. In this tutorial, we will learn how to remove URLs from text in Python.

A URL is a link for any given resource on the internet. A URL is unique for every resource but they all follow the same structure. A URL will be different in every text and a given text may contain so we need to first identify the URL from its format and remove it.

For this, we can use Regular Expressions (regex). Regex is a technique that can create patterns that can identify some substring from a string. Since every URL shares the same structure, we can create a regex pattern that can identify the URL from a string.

We have to use the re module to work with regular expressions in Python.

Ways to remove URLs from Text in Python

This tutorial will demonstrate different methods from the re module that can be used to remove URLs from text in Python.

Using the re.sub() function to remove URLs from Text in Python

The re.sub() function provides the most straightforward approach to remove URLs from text in Python.

This function is used to substitute a given substring with another substring in any provided string. It uses a regex pattern to find the substring and then replace it with the provided substring.

To remove URLs from text in Python we can use this function with many patterns. We will demonstrate different possible regex patterns that can identify the URLs in our example.

See the code below.

Output:

This is a text with a URL to remove.
This is a text with a URL to remove.
This is a text with a URL to remove.

In the above example, we used three patterns to detect and remove URLs from text in Python. One can use whatever pattern works for their code. For our example, all three work. We will use only one pattern in the following examples.

Using the re.findall() function to remove URLs from Text in Python

The re.findall() function is used to find the total occurrences of a substring in a given string based on a regex pattern. It returns a list of all the occurrences of the substring.

We can use this function to find the URLs in a given string and then remove them using the replace() function. With the replace() function, we will replace the occurrence of the given URL with an empty string.

See the code below.

Output:

This is a text with a URL to remove.

Using the re.search() function to remove URLs from Text in Python

We can also use the re.match() and re.search() function to find a substring based on the regex pattern. However, both these functions only return the first occurrence of the substring. So, if a string contains more than one URL, these methods will fail.

Another downside of the re.match() function is that it only searches the first line of the string. So, if we have a string with only one URL, we can use the re.search() function.

The matched substring is returned in a match object.

See the code below.

Output:

This is a text with a URL to remove.

Using the urllib.urlparse class to remove URLs from Text in Python

In Python, we can send requests to a given address using modules like urllib, requests, and more. With the urllib.urlparse class, we can parse URLs and break them into components.

The urllib.parse object parses a URL string. We can use the scheme attribute of this object to check whether a string matches the structure of a URL or not.

To remove URLs from text in Python with this method, we will first break the text into a list of strings. This can be achieved using the split() function that can split strings into a list of strings based on some character.

We will then use the scheme attribute to check if each string in the list matches a URL or not. If the match is True, we will ignore that string. Finally, we will combine the remaining elements of the list using the join() function.

See this logic implemented below.

Output:

This is a text with a URL to remove.

This is the only method that does not use any regex.

Conclusion

To conclude, we discussed several methods to remove URLs from text in Python. Most of the methods used regular expressions to detect and replace the URL from a string with an empty string. The final method involves the urllib.urlparse module does not use regex and uses other functions within.


import_contacts

You may also like:

Related Posts

  • 16 February

    Remove Backslash from String in Python

    Table of ContentsHow to remove backslash from string in python?Using replace() Function to Remove Backslash from String in PythonUsing the decode() Function to Remove Backslash from String in PythonUsing re Library Functions to Remove Backslash from String in PythonUsing strip() Function to Remove Backslash from String in PythonHow To Remove Backslash from Json String in […]

  • Remove quotation marks from String in Python
    13 February

    Remove Quotation Marks from String in Python

    Table of ContentsQuotes in PythonRemove Double Quotes from String in PythonUsing the replace() Function to Remove Double Quotes from String in PythonUsing the re.sub() Function to Remove Double Quotes from String in PythonUsing the for Loop to Remove Double Quotes from String in PythonUsing the join() Function to Remove Double Quotes from String in PythonUsing […]

  • Parentheses from String in Python
    08 February

    Remove Parentheses from String in Python

    Table of ContentsWays to Remove Parentheses from String in PythonUsing the replace() Function to Remove Parentheses from String in PythonUsing the re.Sub() Function to Remove Parentheses from String in PythonUsing the pandas.Str.Replace() Function to Remove Parentheses from String in PythonConclusion In this article, we will see how to remove parentheses from String in Python. Parentheses […]

  • Remove word from String in Python
    08 February

    Remove Word from String in Python

    Table of ContentsRemove Word from String in PythonHow to Remove Word from Sentence in PythonUsing the replace() functionUsing the re.sub() functionUsing the startswith() functionUsing the removeprefix() functionUsing the endswith() functionUsing the removesuffix() functionHow to Remove Duplicate Words from String in PythonUsing the set() functionUsing the set() and join() functionsUsing the join() and a user-defined functionUsing […]

  • 22 January

    Replace space with underscore in Python

    Table of ContentsWays to replace space with underscore in PythonUsing the for loopUsing the replace() functionUsing the re.sub() functionUsing the split() and join() functionConclusion Strings are an essential data type in programming. In Python, we can treat strings as an iterable of characters, and can perform a variety of functions and operations on them. Replacing […]

  • Check if variable is String in Python
    13 January

    Check if variable is String in python

    Table of ContentsHow to check if a given variable is of the string type in Python?Using the isinstance() function.Using the type() function.Check if function parameter is String In this post, we will see what is a string in Python and how to check whether a given variable is a string or not. There are many […]

Leave a Reply

Your email address will not be published.

Subscribe to our newletter

Get quality tutorials to your inbox. Subscribe now.