Table of Contents
Remove Word from String in Python
In this tutorial, different methods are demonstrated on how to remove word from string in Python. We will study two ways to achieve this. First, we will discuss how to remove a specific word from sentences in Python. Then, we will discuss how to remove duplicate words from a string in Python.
How to Remove Word from Sentence in Python
We will first discuss methods to remove words from a string in Python.
Using the replace()
function
We can use the replace()
function to remove word from string in Python. This function replaces a given substring with the mentioned substring. We can replace a word with an empty character to remove it.
For example,
1 2 3 4 5 |
a1 = "remove word from this" a2 = a1.replace("word", '') print(a2) |
Output:
We can also specify how many occurrences of a word we want to replace in the function. For this, we can use the count
parameter. By default, all occurrences are replaced.
Using the re.sub()
function
The regular expressions can identify parts of a string using a pattern. The re.sub()
function replaces a given substring that matches the regular expression pattern with some desired string.
We can identify specific words using regular expressions and substitute them with an empty string to remove them.
See the code below.
1 2 3 4 5 6 7 |
import re a1 = "remove word from this" p = re.compile('(\s*)word(\s*)') a2 = p.sub(' ', a1) print(a2) |
Output:
In the above example, the re.compile()
function compiles a pattern that identifies the substring word
.
Using the startswith()
function
This method can remove word from the start of the sentence. The startswith()
function returns True or False, based on whether the string starts with a given value or not.
In this method, if the function returns True, we will slice the string till the length of the word to be removed.
See the code below.
1 2 3 4 5 |
a1 = "word remove from this" a2 = a1[a1.startswith('word') and len('word'):] print(a2) |
Output:
Using the removeprefix()
function
This is similar to the previous method. It will only remove words from the start of the sentence if they exist. This function only exists in Python 3.9 and above.
For example,
1 2 3 4 5 |
a1 = "word remove from this" a2 = a1.removeprefix('word') print(a2) |
Output:
Using the endswith()
function
This method can remove a word from the end of a sentence. The endswith()
function returns True or False, based on whether the string ends with a given value or not.
Here also, we will slice the string if the function returns True.
See the code below.
1 2 3 4 5 |
a1 = "remove from this word" a2 = a1[:-(a1.endswith('word') and len('word'))] print(a2) |
Output:
Using the removesuffix()
function
This method is similar to the previous one and can eliminate a word from the end of the string. It is only available in Python 3.9 and above.
For example,
1 2 3 4 5 |
a1 = "remove from this word" a2 = a1.removesuffix('word') print(a2) |
Output:
Further reading:
How to Remove Duplicate Words from String in Python
We will now discuss how to remove duplicate words from string in Python. A common operation in these methods involves splitting a string into a list of words. For this, we use the split()
function as shown in the sample codes.
Using the set()
function
A set is an unordered collection of elements. It contains only unique elements. We can use it to store a collection of unique words from a string.
We can then use a for
loop to compare each word and check whether it belongs in the set object or not. If the object is not present, it is appended to the final string.
We implement this logic in the code below.
1 2 3 4 5 6 7 8 9 10 |
a1 = "remove word from word this word" s = set() a2 = '' for word in a1.split(): if word not in s: a2 = a2 + word + ' ' s.add(word) print(a2) |
Output:
In the above example, we can observe that we have successfully removed any duplicate words from the string a1
.
Using the set()
and join()
functions
This method uses a similar approach to the previous method. We will proceed by splitting a string into a list of words. We will then pass this list to the set()
function and automatically remove any duplicate words.
After this, we will convert the words stored in the set object back to a string. For this, we will use the join()
function. With the join()
function, we can combine the elements of an iterable in a string by providing the separator character for the elements.
Let us now use both these functions to remove duplicate words from a string in Python.
See the code below.
1 2 3 4 5 6 |
a1 = "remove word from word this word" l = a1.split() a2 = ' '.join(sorted(set(l), key = l.index)) print(a2) |
Output:
In the above example, we use the sorted()
function to maintain the order of the words in the string. We sort it by their index in the list l
.
Using the join()
and a user-defined function
This method also follows a similar approach to the previous one. We will start by splitting the string into a list of words. In this method, instead of using the sets to remove any duplicate, we will create a function that will eliminate duplicate words from the list.
For example,
1 2 3 4 5 6 7 8 9 10 |
def lst_unique(l): lst = [] [lst.append(x) for x in l if x not in lst] return lst a1 = "remove word from word this word" l = a1.split() a2 = ' '.join(lst_unique(l)) print(a2) |
Output:
In the above example, the lst_unique()
function ensures that every element of the list is unique.
Using the collections.OrderedDict
class
The collections.OrderedDict
class creates a dictionary by arranging the order of the elements. We store the elements as keys and combine them using the join()
function.
For example,
1 2 3 4 5 6 7 8 |
collections.OrderedDict</code> class"> from collections import OrderedDict a1 = "remove word from word this word" l = a1.split() a2 = ' '.join(OrderedDict((s,s) for s in l).keys()) print(a2) |
Output:
Using the numpy.duplicate()
function
The numpy.duplicate()
function creates arrays from existing arrays, lists by eliminating the duplicate elements. We can use the list of words to create such an array of unique elements. After this, we will combine the elements using the join()
function as done in the previous methods.
The downside of this method is that it sorts the element, so the original order of the string is lost.
See the code below.
1 2 3 4 5 6 7 8 9 |
numpy.duplicate()</code> function"> import numpy as np a1 = "remove word from word this word" l = a1.split() arr = np.unique(l) a2 = ' '.join(arr) print(a2) |
Output:
Using the regular expressions
We can use regular expressions to detect sub-strings based on regular expression patterns. We can use regular expressions to remove consecutive duplicate words using some pattern.
We will use the re.sub()
function to substitute the words that will match this pattern with the first occurrence of the word.
See the code below.
1 2 3 4 5 6 |
import re a1 = "remove word word word from this" a2 = re.sub(r'\b(\w+)( \1\b)+', r'\1', a1) print(a2) |
Output:
Conclusion
This article demonstrated how to remove word from String in Python. Let’s wrap up with the most straightforward methods discussed. The replace()
and re.sub()
function can remove a specific word from a string very easily. Other methods remove words from the start or end of the sentence. We also discussed how to remove duplicate words from a string in Python. The main approach to remove duplicate words was to split the string into an iterable, remove the duplicate items, and combine them into a string again.