Remove Non-alphanumeric Characters in Python [3 Ways]

Table of Contents

1. Overview
2. Introduction to Problem
3. Using Regular Expression
4. Using isalnum() Method
5. Using str.translate() Method
6. Performance
7. Conclusion

1. Overview

Non-alphanumeric characters are those that are not letters or numbers, such as punctuation marks, symbols, spaces, or special characters. Sometimes, we may want to remove these characters from String in Python, for example, to clean user input, extract data etc.

2. Introduction to Problem

Let’s say we have a string:


text = "456-abc78!@#$%9AB,C"

text = "456-abc78!@#$%9AB,C"

As we can see, the string text contains digits, letters, and various punctuation marks, symbols.

Our goal is to remove Non-alphanumeric characters from the string and leave letters, digits in the result:

456abc789ABC

456abc789ABC

In this article, we will explore some ways to remove non-alphanumeric characters from a string in Python using built-in methods, regular expressions.

3. Using Regular Expression

Use regular expressions with the re.sub() method to remove non-alphanumeric characters from the specified string in Python.


import re
text = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"
extracted_string = re.sub(r'\W+', '', text)
print(extracted_string)

import re

text = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"

extracted_string = re.sub(r'\W+', '', text)

print(extracted_string)

456abc789ABC

456abc789ABC

Here, we used re.sub(), a built-in method of the re module we imported earlier.

The re.sub() method can take the following parameters, whose brief description is given below:

pattern – It is the regular expression we want to match. It can also be a Pattern object.
repl – This parameter contains the replacement string.
string – It holds the actual input string.
count – This parameter is used to specify the maximum number of replacements, or we can say a maximum number of matches we want to replace using the re.sub() method.
flag – It represents one or multiple regex flags, used to manipulate the pattern‘s standard behaviour.

However, we used the first three parameters with the re.sub() method to replace all the occurrences of non-alphanumeric characters.

Now, the point is what regular expression we use and which string value was used as a replacement? We used \W as regex, which matched non-word characters such as $, &, etc. For every non-word character, we used an empty string as a replacement while string was the actual input string.

The \w (lowercase w) is used to match the word characters such as a digit, a letter, and an underscore ([a-zA-z0-9_]) while \W (uppercase W) is used to match the non-word characters which include anything excluding [a-zA-z0-9_].

4. Using isalnum() Method

To remove non-alphanumeric characters in Python:

Use the filter() method with str.isalnum to check if the string contains alphanumeric characters and filter them.
Use the join() method to join all the characters of the filtered string received from the previous step.


text = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"
extracted_string = ''.join(filter(str.isalnum, text))
print(extracted_string)

text = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"

extracted_string = ''.join(filter(str.isalnum, text))

print(extracted_string)

456abc789ABC

456abc789ABC

Here, we used the exact input string. So let’s learn the code step by step to understand it clearly. First, we used filter() method, which took two arguments, the str.isalnum function and the string iterable.

The filter() method selected one item at a time from the string iterable, applied the str.isalnum function that we passed as the first argument and selected an item from the string iterable based on the output of the str.isalnum function. If the str.isalnum returns True, the current item will be selected by the filter() method; otherwise, it will not.

Now, what the str.isalnum function was doing? We used this method to determine if the string contains alphanumeric characters.

The filter() method returned an iterator after going through the whole string, which was then passed to the .join() method as an argument. The .join() method joined all the characters without any separator because we used an empty string as a delimiter.

The following solution is an alternative to the above code, but we used the for loop to check all the characters of the string and see if that is alphanumeric in Python.


string = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"
extracted_string = ''.join(c for c in string if c.isalnum())
print(extracted_string)

string = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"

extracted_string = ''.join(c for c in string if c.isalnum())

print(extracted_string)

456abc789ABC

456abc789ABC

5. Using str.translate() Method

To remove non-alphanumeric characters in Python:

Use the str.maketrans() method to get a mapping table.
Use the string.translate() method to replace one character with the other in the mapping table we created in the previous step.


import string
my_string = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"
mapping_table = str.maketrans('', '', string.punctuation)
extracted_string = my_string.translate(mapping_table)
print(extracted_string)

import string

my_string = "456abc789ABC-{}[]:*(!@#$%-=\"<>?/~,^&*().`"

mapping_table = str.maketrans('', '', string.punctuation)

extracted_string = my_string.translate(mapping_table)

print(extracted_string)

456abc789ABC

456abc789ABC

First, we imported the string module to use functions to process the standard Python strings. Next, we used str.maketrans() to make mapping tables (also referred to as translation table), which was then passed to the my_string.translate() method as a parameter.

The str.maketrans() function takes three arguments: a string of characters to map, a string of characters to map to, and a string of characters to delete. In this case, we use empty strings for the first two arguments, and a string that contains all alphanumeric characters for the third argument. This creates a translation table that maps all alphanumeric characters to None, which means they will be deleted from the input string. Then, we use the str.translate() method to apply the translation table to the input string 3.

6. Performance

Let’s compare performance of above mentioned methods. We will create separate function for each method and run it 1000 times for big string.


import re
import string
import timeit

def remove_non_alnum_re(s):
    return re.sub(r'\W+', '', s)

def remove_non_alnum_isalnum(s):
    return ''.join(c for c in s if c.isalnum())

def remove_non_alnum_translate(s):
    table = str.maketrans('', '', string.punctuation)
    return s.translate(table).replace(' ', '')

big_string = 'This is a big string with some non-alphanumeric characters!@#$%^&*()_+{}[]|\/<>?`~'

# Run each function 1000 times and print the average time
print('re.sub:', timeit.timeit(lambda: remove_non_alnum_re(big_string), number=1000))
print('isalnum:', timeit.timeit(lambda: remove_non_alnum_isalnum(big_string), number=1000))
print('translate:', timeit.timeit(lambda: remove_non_alnum_translate(big_string), number=1000))

import re

import string

import timeit

def remove_non_alnum_re(s):

return re.sub(r'\W+', '', s)

def remove_non_alnum_isalnum(s):

return ''.join(c for c in s if c.isalnum())

def remove_non_alnum_translate(s):

table = str.maketrans('', '', string.punctuation)

return s.translate(table).replace(' ', '')

big_string = 'This is a big string with some non-alphanumeric characters!@#$%^&*()_+{}[]|\/<>?`~'

# Run each function 1000 times and print the average time

print('re.sub:', timeit.timeit(lambda: remove_non_alnum_re(big_string), number=1000))

print('isalnum:', timeit.timeit(lambda: remove_non_alnum_isalnum(big_string), number=1000))

print('translate:', timeit.timeit(lambda: remove_non_alnum_translate(big_string), number=1000))


re.sub: 0.002696799999999999
isalnum: 0.001636900000000001
translate: 0.0010567000000000007

re.sub: 0.002696799999999999

isalnum: 0.001636900000000001

translate: 0.0010567000000000007

This means that the translate() method is the fastest, followed by the isalnum() method, and the re.sub() method is the slowest. However, the difference is not very significant, and the choice of method may depend on other factors, such as readability, compatibility, or special cases.

7. Conclusion

In this article, we explored three different methods to remove Non-alphanumeric Characters in Python: using regular expressions, using the isalnum() method, and using the str.translate() method. We discussed each method, and showed some examples of how to apply them.

Was this post helpful?

Let us know if this post was helpful. Feedbacks are monitored on daily basis. Please do provide feedback as that\'s the only way to improve.

Remove Non-alphanumeric Characters in Python

1. Overview

2. Introduction to Problem

3. Using Regular Expression

4. Using isalnum() Method

Further reading:

Remove Non-alphanumeric Characters in Python

Remove unicode characters from String in Python

5. Using str.translate() Method

6. Performance

7. Conclusion

Was this post helpful?

Author

Leave a Reply Cancel reply

Categories

Popular Posts

Let’s be Friends

1. Overview

2. Introduction to Problem

3. Using Regular Expression

4. Using isalnum() Method

Further reading:

Remove Non-alphanumeric Characters in Python

Remove unicode characters from String in Python

5. Using str.translate() Method

6. Performance

7. Conclusion

Was this post helpful?

Related posts:

Share this

Author

Leave a Reply Cancel reply

Let’s be Friends