Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Web Scraping Project

Hi all,

 

Ive built a simple scraper which pulls pricing data instead of copy/pasting it manually.  Currently each Query is pointed to a product URL by changing the "source" of the query.  Is there a way that I can create a table of URLs and have one query look at each of them and extract the Source code data, instead of having separate Queries for each product page and appending them?

 

It would save me considerable copy/paste time if I can acheve this.


Thanks

3 REPLIES 3
Anonymous
Not applicable

Hi @Anonymous 

 

I assume you have something like this.

 

let
    Source = Web.BrowserContents("https:/abc.com/?page=12"),
    //Your addtional tranfomation steps goes below
    .
    ...
    ....
    #"LastStep" = ....
in
    #"LastStep"

 

 

Then you have other URLs for which you need to apply same steps. In that case follow below steps.

 

  1. Create a new table with single column(URL) having all the urls you want. Lets say this table Products.
  2. Now change above query as below, this creates a custom function for you.

 

(url as text) =>
let
    Source = Web.BrowserContents(url), // Replace the hardcoded url with parameter url
    //Your addtional tranfomation steps goes below
    .
    ...
    ....
    #"LastStep" = ....
in
    #"LastStep"​

 

  • Go to the Products table created in step 1. Click on the table icon displayed on top left corner of the table.tableicon.PNG
  • Then choose Invoke Custom Function.
  • Under function query choose the function created in step 2.
  • For url select column name URL  and hit ok.
  • Then click on expand icon as below expand.PNG

Thats all you need I hope.

 

 

Appreciate with kudos by clicking the like button on bottom right.

Please mark as a solution if this solves your problem.

 

Thanks

 

 

 

 

Hi @Anonymous ,

as @Anonymous  said.

 

But if you want to regularly refresh the results in Power BI service, you have to move the dynamic URLs into the query parameters instead. Otherwise you'll get an error complaining about dynamic data sources: https://www.thebiccountant.com/2018/03/22/web-scraping-2-scrape-multiple-pages-power-bi-power-query/

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @Anonymous you can create a power query function a Table with URLS and iterate over those URLS to get the data.

 

Thanks.

 

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.