Get HTML from URL in Python

Get HTML from URL in Python

HTML pages with Python

Webpages are made using HTML. It is the programming code that defines the webpage and its contents. It is at the core of every website on the internet.

We can access and retrieve content from web pages using Python. Python allows us to access different types of data from URLs like JSON, HTML, XML, and more. We can use different libraries for working with HTML in Python.

Get HTML from URL in Python

We will now discuss how to get HTML from URL in Python.

Using the urllib library to get HTML from URL in Python

The urllib library in Python is used to handle operations related to fetching and working with URLs and accessing different URLs. We can use different functionalities from this module to get HTML from URL in Python.

First, we need to access the URL. For this, we use the urllib.request class. We can use the urllib.request.urlopen() function to create a urllib.request class object that creates a connection to the desired URL. We specify the URL within the urlopen() function.

Then, to get HTML from URL in Python, we use the read() function with this object. In Python 3, this returns a bytes object. So, we need to convert this object to a string by decoding it.

We will use the decode() function to retrieve the HTML as strings and display it. One should also terminate the urllib.request object using the close() function.

We will now use this in the code below.

Output:

<title>A very simple webpage</title>

In the above example,

  • We open the URL using the urllib.request.open() function.
  • We read the data and confirm that it is read as bytes.
  • We decode this bytes object to string using the decode() function.
  • We specify the utf-8 encoding in the decode() function to get the string.
  • The HTML is stored as a string and the first 50 characters are displayed.

There is a slight difference in using this library with Python 2. The urllib was introduced in Python 1.2. With Python 2, urllib2 was created which was intended to replace the urllib library. However, with Python 3, a new urllib was introduced which merged the previous versions. So, now we have to use the urllib library in Python 2 as well since the urllib2 library was split and divided into this library.

While using urllib in Python 2, we do not need to import the urllib.request class since the urllopen() function is present in urllib only. Also, the read() function is used to get HTML from URL in Python directly as a string. This removes the need for any decoding.

Notice the changes in the code below.

Output:

<title>A very simple webpage</title>

Using the requests library to get HTML from URL in Python

The requests library in Python is a simple, efficient library that aims to provide simple APIs to send HTTP requests. It is based on the urllib3 library, which is a third-party package and not part of the standard library.

We can use this library to get HTML from URL in Python. The requests.get() function is used to send a GET request to the URL specified within the function. It returns some response.

We can get the content from the response using the text() function. This will return the content of HTML as a string.

For example,

Output:

<title>A very simple webpage</title>

Using the urllib3 library to get HTML from URL in Python

As discussed earlier, the urllib3 library is a third-party library that is also used internally by the requests library. We can use this library also to get HTML from URL in Python.

First, we need to create a PoolManager object using urllib3.PoolManager() constructor. This object is used to handle the requirements for the requests and ensures thread safety.

We use the request() function with this object to send the GET request for the given URL. We read its contents using the data() function.

It also returns the HTML as bytes so we need to decode the content using the decode() function.

See the following example,

Output:

<title>A very simple webpage</title>

In the above example,

  • The http object belongs to the PoolManager class.
  • We use the request() function with this object.
  • We need to specify that we are sending a GET request within the function with the URL.
  • The HTML is then retrieved and decoded using the decode() function.

Conclusion

In this article, we discussed three libraries to get HTML from URL in Python. The first method used was the urllib library. This is one of the most popular and commonly used libraries to get HTML from URL in Python. There is a difference while using this library in Python 2 due to its history and changes. We also use the requests and urllib3 library. The requests library is based on the urllib3 library, which is not part of the standard Python library and was developed as a third-party package.

That’s all how to get HTML from URL in Python.


import_contacts

You may also like:

Related Posts

  • Make requirements.txt in Python
    03 May

    Make requirements.txt in python

    Table of ContentsWhat is the requirements.txt file in Python?Ways to make requirements.txt file in PythonUsing the pip to make requirements.txt in PythonUsing the conda command to make requirements.txt file in PythonUsing the pipreqs package to make requirements.txt file in PythonConclusion What is the requirements.txt file in Python? Every package is also updated regularly and features […]

  • Create an array of 1 to 10 in Python
    03 May

    Create an Array of 1 to 10 in Python

    Table of ContentsIntroductionHow to create an array of 1 to 10 in Python?Using the range() function to create an array of 1 to 10 in Python.Using list comprehension along with the range() function to create an array of 1 to 10 in Python.Using a user-defined function to create an array of 1 to 10 in […]

  • Remove Urls from String in Python
    03 May

    Remove Urls from Text in Python

    Table of ContentsIntroductionWays to remove URLs from Text in PythonUsing the re.sub() function to remove URLs from Text in PythonUsing the re.findall() function to remove URLs from Text in PythonUsing the re.search() function to remove URLs from Text in PythonUsing the urllib.urlparse class to remove URLs from Text in PythonConclusion In this post, we will […]

  • Create a list from 1 to 100 in Python
    03 May

    Create a List from 1 to 100 in Python

    Table of ContentsWays to create a list from 1 to 100 in PythonUsing the range() function to create a list from 1 to 100 in PythonUsing the numpy.arange() function to create a list from 1 to 100 in PythonUsing the for loop with range() to create a list from 1 to 100 in PythonConclusion In […]

  • Return vs Print in Python
    16 April

    Return vs Print in Python

    Table of ContentsReturn vs Print in PythonConclusion Return vs Print in Python In Python, we work with a wide range of functions that make our code simple. We can also create functions using the def keyword. A function can be defined as per our requirement and can display something or return a value. This is […]

  • Floor division in Python
    10 April

    Floor division in Python

    Table of ContentsWhat is Floor Division in Python?How to implement floor division in Python?Using the // operator to implement floor division in Python.Using the // operator to implement floor division on negative integers in Python.Using the // operator to implement floor division on floating-point numbers in Python.Using the // operator to implement floor division with […]

Leave a Reply

Your email address will not be published.

Subscribe to our newletter

Get quality tutorials to your inbox. Subscribe now.