Remove Urls from Text in Python

Table of Contents

Introduction
Ways to remove URLs from Text in Python
Conclusion

In this post, we will see how to remove Urls from text in Python.

Introduction

In Python, we can read and process text data. We can perform various operations on such texts using different libraries. In this tutorial, we will learn how to remove URLs from text in Python.

A URL is a link for any given resource on the internet. A URL is unique for every resource but they all follow the same structure. A URL will be different in every text and a given text may contain so we need to first identify the URL from its format and remove it.

For this, we can use Regular Expressions (regex). Regex is a technique that can create patterns that can identify some substring from a string. Since every URL shares the same structure, we can create a regex pattern that can identify the URL from a string.

We have to use the re module to work with regular expressions in Python.

Ways to remove URLs from Text in Python

This tutorial will demonstrate different methods from the re module that can be used to remove URLs from text in Python.

Using the `re.sub()` function to remove URLs from Text in Python

The re.sub() function provides the most straightforward approach to remove URLs from text in Python.

This function is used to substitute a given substring with another substring in any provided string. It uses a regex pattern to find the substring and then replace it with the provided substring.

To remove URLs from text in Python we can use this function with many patterns. We will demonstrate different possible regex patterns that can identify the URLs in our example.

See the code below.


import re
t ="This is a text with a URL https://www.java2blog.com/ to remove."
s1 = re.sub('http://\S+|https://\S+', '', t)
s2 = re.sub('http[s]?://\S+', '', t)
s3 = re.sub(r"http\S+", "", t)
print(s1)
print(s2)
print(s3)

import re

t ="This is a text with a URL https://www.java2blog.com/ to remove."

s1 = re.sub('http://\S+|https://\S+', '', t)

s2 = re.sub('http[s]?://\S+', '', t)

s3 = re.sub(r"http\S+", "", t)

print(s1)

print(s2)

print(s3)

Output:

This is a text with a URL to remove.
This is a text with a URL to remove.
This is a text with a URL to remove.

In the above example, we used three patterns to detect and remove URLs from text in Python. One can use whatever pattern works for their code. For our example, all three work. We will use only one pattern in the following examples.

Conclusion

To conclude, we discussed several methods to remove URLs from text in Python. Most of the methods used regular expressions to detect and replace the URL from a string with an empty string. The final method involves the urllib.urlparse module does not use regex and uses other functions within.

Was this post helpful?

Let us know if this post was helpful. Feedbacks are monitored on daily basis. Please do provide feedback as that\'s the only way to improve.

Remove Urls from Text in Python

Introduction

Ways to remove URLs from Text in Python

Using the `re.sub()` function to remove URLs from Text in Python

Further reading:

Remove Backslash from String in Python

Remove Parentheses from String in Python

Using the `re.findall()` function to remove URLs from Text in Python

Using the `re.search()` function to remove URLs from Text in Python

Using the `urllib.urlparse` class to remove URLs from Text in Python

Conclusion

Was this post helpful?

Author

Leave a Reply Cancel reply

Categories

Popular Posts

Let’s be Friends

Introduction

Ways to remove URLs from Text in Python

Using the re.sub() function to remove URLs from Text in Python

Further reading:

Remove Backslash from String in Python

Remove Parentheses from String in Python

Using the re.findall() function to remove URLs from Text in Python

Using the re.search() function to remove URLs from Text in Python

Using the urllib.urlparse class to remove URLs from Text in Python

Conclusion

Was this post helpful?

Related posts:

Share this

Author

Leave a Reply Cancel reply

Let’s be Friends

Using the `re.sub()` function to remove URLs from Text in Python

Using the `re.findall()` function to remove URLs from Text in Python

Using the `re.search()` function to remove URLs from Text in Python

Using the `urllib.urlparse` class to remove URLs from Text in Python