Gartner® Hype Cycle™ for Data Management 2024
Read The ReportGartner® Data Management 2024
Read The ReportLearn how you can easily scrape data with Apify and Keboola
Whether you saw or missed our webinar, we thought it would be useful to provide a step-by-step guide on how to set up quick competition monitoring (or, any other web scraping and data processing automation) with Apify and Keboola. Thank you Apify and Revolt.bi for the collaboration!
So what can you do with automated competition data processing? In this article, we’ll take an example of daily monitoring of the best-sellers list at Amazon. But in reality, you can apply the same process to similar use cases.
Follow these instructions, create a free account and start automated data processing in minutes.
Go to apify.com and create a free account using a Sign up button. On a free plan, you have a $5 monthly credit and proxy trial for a first month. Once you verify your account using an email verification link, go to the Apify Store and look for an Amazon Best Sellers Scraper.
Let’s start with configuring an actor, a serverless cloud programs running on the Apify platform that can perform arbitrary computing jobs such as send an email or crawl a website with millions of pages.
Click on the Try me button.
After clicking the Try me button you’ll be redirected to your fresh Apify account and a new task for this public actor is automatically created.
All you need to set up now is a category URL on Amazon and depth for crawling. That’s handy when you want to extract not just 100 books for the top category, but also all other sub-categories.
Let’s check an Amazon website with best sellers. Amazon shows 100 best selling products in each category and we can extract any category we want. Let’s choose a Books category.
Now you need to copy the page URL, in this case it’s:
https://www.amazon.com/best-sellers-books-Amazon/zgbs/books/ref=zg_bs_nav_0
Go back to the task configuration, click on Add URL in the Category URLs input and paste the URL above.
Hitting the Save & Run button will start your scraper on Apify servers (everything is running on the Apify cloud platform).
The task should take just a couple of seconds. Once finished, click on a results box.
In the Export section, select a HTML table format and click on View in another tab button.
In a new tab, you’ll see extracted data in a structured table format.
The next step is integration with Keboola, so you can schedule scraper runs from there and automatically fetch data from Apify to Keboola. For a better reference, change the name of your task from my-task to e.g. Amazon-best-sellers-books. You can do that in the Settings tab of your task.
Go to keboola.com and create a free pay-as-you-go project.
Go to Components -> Extractors and search for Apify extractor.
Click on New configuration.
Name your configuration, agree with terms and click on Create configuration.
Click on Configure extractor. Select Run task and click Next. We’re not using our own actor, but a task for a public actor.
On the next screen you’re asked for an Apify user ID and an API token. To get these, go back to the Apify console, navigate to Settings -> Integrations and copy your Used ID and API token.
Go back to the Keboola project, paste Apify user ID and API token to the extractor wizard and click on Next. On the last page the only thing you have to set is the task you created. Select the Amazon-best-sellers-books you’ve created in previous steps and click on Save.
Now you can start the scraper (Apify task) from the Keboola project using a Run component action and confirming by Run button. If you go back to the Apify console (Tasks -> Amazon-best-sellers-books -> Runs), you can see a new run of your Task with an API as an Origin (started from Keboola via API).
Once the job is done (you see finished job in the last runs section), you can click on a link in a Results table.
You can see a basic info about the extracted dataset there and click on Explore in storage to see more info and sample data. Once you're on storage, you can click on the Data sample to see data - exactly the same data you saw in Apify.
Now the integration between Keboola and Apify is completed. Once you start the extractor in Keboola, a task in Apify will be started automatically via API. Extractor will wait till the Apify task run finishes and then will fetch data from Apify and store them in Keboola.
Congrats, you just learned how to use Keboola and Apify platforms to automate competitor scaping process. Try it yourself by creating a forever-free account. No credit card is required.
Go to Components -> Extractors and search for Apify extractor.
Click on New configuration.
Name your configuration, agree with terms and click on Create configuration.
Click on Configure extractor. Select Run task and click Next. We’re not using our own actor, but a task for a public actor.
On the next screen you’re asked for an Apify user ID and an API token. To get these, go back to the Apify console, navigate to Settings -> Integrations and copy your Used ID and API token.
Go back to the Keboola project, paste Apify user ID and API token to the extractor wizard and click on Next. On the last page the only thing you have to set is the task you created. Select the Amazon-best-sellers-books you’ve created in previous steps and click on Save.
Now you can start the scraper (Apify task) from the Keboola project using a Run component action and confirming by Run button. If you go back to the Apify console (Tasks -> Amazon-best-sellers-books -> Runs), you can see a new run of your Task with an API as an Origin (started from Keboola via API).
Once the job is done (you see finished job in the last runs section), you can click on a link in a Results table.
You can see a basic info about the extracted dataset there and click on Explore in storage to see more info and sample data. Once you're on storage, you can click on the Data sample to see data - exactly the same data you saw in Apify.
Now the integration between Keboola and Apify is completed. Once you start the extractor in Keboola, a task in Apify will be started automatically via API. Extractor will wait till the Apify task run finishes and then will fetch data from Apify and store them in Keboola.
Congrats, you just learned how to use Keboola and Apify platforms to automate competitor scaping process. Try it yourself by creating a forever-free account. No credit card is required.