Improving your workflow with ImportHTML

We spend a lot of time working in Google Sheets and we are pleasantly surprised when we learn something new that improves our workflow. It’s pretty unbelievable that we hadn’t come across this before. One thing we recently discovered is the ImportHTML function. We stumbled upon this when we needed to compile some data to build charts. While our preference is to work with spreadsheets or .csv files, the data we need is not always available in these formats. Sometimes the only way to get the data we need is the tried and true copy-and-paste method. That is copying and pasting data from tables or lists on web pages. In some instances, this method may be the most straightforward and easiest way to get the information. While other times the tables may be too large, or there are too many tables to copy to get the info. It’s an exercise that can get very tedious and repetitive. This is where the ImportHTML function comes in.

What is ImportHTML?

If you have ever spent time copying and pasting information from a table into a spreadsheet then you know how boring and frustrating it can be. Especially, if your mouse does funky things when you’re trying to highlight the information you want to import. No one enjoys having to constantly try to highlight the same thing over and over again because your mouse slips or you accidentally deselect the text. Enter ImportHTML. This is a function that allows you to import data from a table or list within an HTML page. Here’s some documentation from Google.

How it works

If you are more of a visual learner, stop here. Instead, go check out this great video on the Teacher’s Tech YouTube channel for a great overview of how it works and some useful tips. We thoroughly enjoyed watching it.

Back to the function. The ImportHTML function has four inputs, three of which are mandatory - URL, query, and index. The fourth, locale, is not required. The function looks something like this =IMPORTHTML(“url”, “query”, index). For this to work, the URL must appear between quotation marks and same for the query. The type of queries accepted by this function are table and list. As for the index, both tables and lists start at 1. So if the web page has tables and lists, it’s possible to have an index of 1 for each of them. If you don’t need the entire table, you can import data from specific columns. The Teacher's Tech channel does a great job explaining how you can do this and more.

Here is an example you can try: =IMPORTHTML("https://en.wikipedia.org/wiki/List_of_tallest_buildings", "table", 2). It’s a table of the world’s tallest building found on this Wikipedia page.

If you’re big into functions, then you may find this Google Sheets Functions list to be a useful resource. Have fun!

 
 

You may also like

 
FWD EDITORS

We’re a team of data enthusiasts and storytellers. Our goal is to share stories we find interesting in hopes of inspiring others to incorporate data and data visualizations in the stories they create.

Previous
Previous

Creating charts with Datawrapper

Next
Next

Data science reading list