The Web Scraping Cheat Sheet
Subtitle: “Load More” button handling tutorial
Quick Start Guide
Here’s another tip on dealing with “Load More”. We hope this tutorial gives you some basic understanding on how to handle pagination and load more buttons.
Are you ready to explore the easiest way to get data behind load more buttons? Today, we’ll look into one of the leading luxury e-commerce websites, Net-a-Porter. Well, summer is coming, so we browsed some sandals!
Net-A-Porter Search Results for Sandals. Source: Net-A-Porte
You might have noticed, the website displays only 60 items at a time. One more, each website url changes as you click on the load more button, meaning that the product information is split across multiple, discrete webpages.
Search Results for Sandals Page 2. Source: Net-A-Porte
This is called pagination, and we can easily handle this type of load more button format with Group Extraction.
Want to learn more about Group Extraction? Check out here!
First things first, click LISTLY WHOLE or LISTLY PART that matches your needs.
You’ll be able to see 60 products were collected. Make sure to select SHOW HYPERLINK to extract URL sources that are needed for group extraction and click on EXCEL. Now you’re ready to collect all the information behind load more buttons.
Hit the + Group button and you'll be able to find the following ADD URL field. Copy the source urls (Column B) you collected in the previous step.
Okay, almost there, go back to your Databoard and click Refresh.
If you click on SUCCESS, you can check out each url and its web scraping status.
Needless to say, you can also download all of them into Excel!