Get HTML from URL in Python

Get HTML from URL in Python

HTML pages with Python

Webpages are made using HTML. It is the programming code that defines the webpage and its contents. It is at the core of every website on the internet.

We can access and retrieve content from web pages using Python. Python allows us to access different types of data from URLs like JSON, HTML, XML, and more. We can use different libraries for working with HTML in Python.

Get HTML from URL in Python

We will now discuss how to get HTML from URL in Python.

Using the urllib library to get HTML from URL in Python

The urllib library in Python is used to handle operations related to fetching and working with URLs and accessing different URLs. We can use different functionalities from this module to get HTML from URL in Python.

First, we need to access the URL. For this, we use the urllib.request class. We can use the urllib.request.urlopen() function to create a urllib.request class object that creates a connection to the desired URL. We specify the URL within the urlopen() function.

Then, to get HTML from URL in Python, we use the read() function with this object. In Python 3, this returns a bytes object. So, we need to convert this object to a string by decoding it.

We will use the decode() function to retrieve the HTML as strings and display it. One should also terminate the urllib.request object using the close() function.

We will now use this in the code below.

Output:

<title>A very simple webpage</title>

In the above example,

  • We open the URL using the urllib.request.open() function.
  • We read the data and confirm that it is read as bytes.
  • We decode this bytes object to string using the decode() function.
  • We specify the utf-8 encoding in the decode() function to get the string.
  • The HTML is stored as a string and the first 50 characters are displayed.

There is a slight difference in using this library with Python 2. The urllib was introduced in Python 1.2. With Python 2, urllib2 was created which was intended to replace the urllib library. However, with Python 3, a new urllib was introduced which merged the previous versions. So, now we have to use the urllib library in Python 2 as well since the urllib2 library was split and divided into this library.

While using urllib in Python 2, we do not need to import the urllib.request class since the urllopen() function is present in urllib only. Also, the read() function is used to get HTML from URL in Python directly as a string. This removes the need for any decoding.

Notice the changes in the code below.

Output:

<title>A very simple webpage</title>

Using the requests library to get HTML from URL in Python

The requests library in Python is a simple, efficient library that aims to provide simple APIs to send HTTP requests. It is based on the urllib3 library, which is a third-party package and not part of the standard library.

We can use this library to get HTML from URL in Python. The requests.get() function is used to send a GET request to the URL specified within the function. It returns some response.

We can get the content from the response using the text() function. This will return the content of HTML as a string.

For example,

Output:

<title>A very simple webpage</title>

Using the urllib3 library to get HTML from URL in Python

As discussed earlier, the urllib3 library is a third-party library that is also used internally by the requests library. We can use this library also to get HTML from URL in Python.

First, we need to create a PoolManager object using urllib3.PoolManager() constructor. This object is used to handle the requirements for the requests and ensures thread safety.

We use the request() function with this object to send the GET request for the given URL. We read its contents using the data() function.

It also returns the HTML as bytes so we need to decode the content using the decode() function.

See the following example,

Output:

<title>A very simple webpage</title>

In the above example,

  • The http object belongs to the PoolManager class.
  • We use the request() function with this object.
  • We need to specify that we are sending a GET request within the function with the URL.
  • The HTML is then retrieved and decoded using the decode() function.

Conclusion

In this article, we discussed three libraries to get HTML from URL in Python. The first method used was the urllib library. This is one of the most popular and commonly used libraries to get HTML from URL in Python. There is a difference while using this library in Python 2 due to its history and changes. We also use the requests and urllib3 library. The requests library is based on the urllib3 library, which is not part of the standard Python library and was developed as a third-party package.

That’s all how to get HTML from URL in Python.

Was this post helpful?


import_contacts

You may also like:

Related Posts

  • 29 November

    Prefix b Before String in Python

    Table of ContentsPrefix b Before String in PythonConclusion Prefix b Before String in Python Prefix b before String denotes a byte String. By putting b before String, you can convert String to bytes in Python. The upgrade from Python 2 to Python 3 was considered a major change as many new features were introduced and […]

  • 29 November

    How to Log to Stdout in Python

    Table of ContentsUse logging.basicConfig() to log to stdoutUse logging.StreamHandler() to log to stdout Use logging.basicConfig() to log to stdout To log to stdout in Python: Specify the format in which we want to have all logs. Use the helper function .basicConfig() to perform basic logging for the logging system. Use .getLogger() to create a logger […]

  • 29 November

    Skip Iterations in Python loop

    Table of ContentsSkip iterations in a Python loop.Using the continue statement.Special Case: Using exception handling. 💡TL;DR To skip iterations in Python loop, use continue statement. [crayon-638b3798b0741338837026/] Looping is one of the fundamentals in Python with one or more types of loops occurring in almost every code. The for loop and the while loop allow the […]

  • 28 November

    Convert String to Path in Python

    Table of ContentsUsing pathlib library [Python 3.4+]Using os.path moduleConclusion Using pathlib library [Python 3.4+] Use Path class’s constructor to convert String to Path in Python. You need to import Path class from pathlib. [crayon-638b3798b081c770606993/] Output: C:\temp\tempFile.txt We used the Path() constructor to convert String to Path. This constructor accepts a string that contains a required […]

  • 28 November

    Get String Between Two Characters in Python

    Table of ContentsUsing the string-slicing techniqueUsing Regular ExpressionsUsing the split() functionConclusion Using the string-slicing technique To get String between two characters in Python: Use String’s find() method to find indices of both the characters. Use String’s slicing to get String between indices of the two characters. [crayon-638b3798b08e4335627527/] Output: ava2Blo In the above example, we found […]

  • 28 November

    Check if Date is Greater than Today in Python

    Table of ContentsUse Comparison Operator with now()Compare the datetime of Different Time ZonesUse Comparison Operator with strptime() Use Comparison Operator with now() To check if the specified date is greater than today in Python: Use .datetime.now() to get the current local time and date. Create a datetime object using datetime.datetime() with the specified date and […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our newletter

Get quality tutorials to your inbox. Subscribe now.