Table of Contents
The UTF-8 encoding is used by default in Python and represents 8-bit Unicode values. The upgrade to Python 3 saw a major change in using ASCII characters to Unicode characters by default for strings.
Encode String to UTF-8 in Python
This tutorial will demonstrate how to encode string to UTF-8 in Python.
Using the encode()
function
To encode string to UTF-8 in Python, use the encode()
function. encode()
function is used to encode String in specific encoding.
See the code below.
1 2 3 4 |
s1 = "Java2Blog" print(s1.encode('UTF-8')) |
Output:
The encode()
function in Python can be used to encode a string to the required encoding. This encoding, by default is UTF-8.
In Python 3, this returns a bytes
type object. In the above example, the b
prefix represents the same.
The same is not the case for Python 2. In this version, bytes and string are basically the same thing. So this, function is redundant since the string is already encoded.
We can observe the same in the code below.
1 2 3 4 |
s1 = "Java2Blog" print s1.encode('UTF-8') |
Output:
Using the codecs.encode()
function
To encode string to UTF-8 in Python, use the codecs.encode()
function.
See the code below.
1 2 3 4 5 |
import codecs s1 = "Java2Blog" print(codecs.encode(s1,'UTF-8')) |
Output:
Python has a standard module called codecs
which defines the base class for all the encoders and decoders in Python. The access to internal Python codec registry is also provided by this module that manages the error handing and codecs.
The codecs.encode()
function can be used to encode an object to the specified format. In the above example, we encode string to UTF-8 in Python using this function.
Conclusion
To conclude, we discussed different methods to encode string to UTF-8 in Python.
We discussed the basics of encoding in Python, highlighting the difference between Python 2 and Python 3.
The encode()
function is used to encode string to UTF-8 in Python. The use of this function is highlighted for Python 2 and 3, and we compare the results of both.
The final method showed the use of the codecs
module. This module contains the base for all encoders and we use the codecs.encode()
function to encode string to UTF-8 in Python.