In this post, we will see how to remove unicode characters in Python.
Table of Contents
For example: You are reading tweets using tweepy in Python and tweepy gives you entire data which contains unicode characters and you want to remove the unicode characters from the String.
Remove unicode characters from String in python
There are many ways to to remove unicode characters from String in Python.
Using encode() and decode() method to remove unicode characters in Python
You can use String’s encode()
with encoding as ascii
and error as ignore
to remove unicode characters from String and use decode() method to decode() it back.
1 2 3 4 5 6 |
str = "This is Python \u200ctutorial" str_en = str.encode("ascii", "ignore") str_de = str_en.decode() print(str_de) |
Output:
Using replace() method to remove unicode characters in Python
If you just want to special unicode character from String, then you can use String’s replace() method for it.
1 2 3 4 5 |
str = "This is Python \u200ctutorial" str_replaced = str.replace('\u200c', '') print(str_replaced) |
Output:
Using character.isalnum() method to remove special characters from String
If you want to remove special characters such as whitespace or slash from String, then you can use character.isalnum()
method.
Here is an exmaple:
1 2 3 4 5 6 7 8 |
str = "abc /i !? 20321?" resultStr = "" for character in str: if character.isalnum(): resultStr = resultStr + character print(resultStr) |
Output:
As you can see, all the special character are removed from the String.
How to remove Unicode "u" from string in Python
There are multiple ways to remove unicode "u" in Python.
Using replace() method
You can use String’s replace()
method to remove unicode "u" from String in python.
Here is an example:
1 2 3 4 5 |
str = "u\'This is Python tutorial'" str_without_u = str.replace("u'", "'") print(str_without_u) |
Output:
Using encode() and decode() method
You can use String’s encode()
method with encoding as ascii
and error as ignore
to remove unicode "u" from String in python.
Here is an example:
1 2 3 4 5 |
str = u'This is Python tutorial' str_en = str.encode('ascii') print(str_en.decode()) |
Output:
That’s all about how to remove unicode characters from String in Python.