cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Frequent Visitor

WORKAROUND: How to use scheduled refresh of a web page without the need for a gateway.

Hi all,

 

Like many others (here, here and here), I have been struggling with the inability of Power BI service to perform a scheduled refresh on a public web source without using a personal gateway. 

 

According to the MS documentation, this is still not possible. (Don't ask me why)

 

The problem with this is that the M function Web.Page(), which is needed to work with HTML input, requires a gateway to work.

 

However, I have been playing around a bit lately, and believe that I have found a workaround that is relatively flexible. I'd like to share it here and get some feedback. Maybe some of you have some suggestions on how to improve the code to make it more broadly applicable.

 

In this example, I am trying to load the table from the following page into Power Query:  https://www.w3schools.com/html/html_tables.asp.

 

The code

 

let
    URL = "https://www.w3schools.com/html/html_tables.asp",
    Start = "<table id=""customers"">",
    End = "</table>",
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents(URL))}),
    ToList = Table.Column(Source, "Column1"),
    RowCount = Table.RowCount(Source),
    TableStart = List.PositionOf(ToList, Start),
    TableEnd = List.PositionOf(ToList, End),
    RemoveTopRows = Table.Skip(Source,TableStart),
    RemoveBottomRows = Table.RemoveLastN(RemoveTopRows,RowCount-TableEnd-1),
    List = Table.Column(RemoveBottomRows, "Column1"),
    ChangeStart = List.ReplaceValue(List, Start, "<data>", Replacer.ReplaceText),
    ChangeEnd = List.ReplaceValue(ChangeStart, End, "</data>", Replacer.ReplaceText),
    XML = Xml.Tables(Lines.ToText(ChangeEnd)),
    Table = XML{0}[Table],
    #"Added Custom1" = Table.AddColumn(Table, "data", each if [th] <> null then [th] else [td]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"th", "td"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Custom", each Table.Transpose([data])),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"data"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", {"Column1", "Column2", "Column3"}, {"Column1", "Column2", "Column3"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Custom", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Company", type text}, {"Contact", type text}, {"Country", type text}})
in
     #"Changed Type"

 

 

How it works

The key here is to circumvent the use of the Web.Page() function, so here are the steps:

 

  1. Specify the URL to crawl.
  2. The second step is to look at the HTML source code of the webpage you want to scrape, and find the lines in the code that contain the start tag and the end tag of the table you want to import. (the lines containing <table> and </table>). NB: It's important to put the entire line here, so if there are more tags than just the table tags, include them as well.
  3. Then the data from the page is loaded. The combination of Lines.FromBinary() and Web.Contents() gives you a list of the raw HTML code from the web page.
  4. Convert the list into a table to locate the position of the table start and table end, and remove the lines above and below the table.
  5. I changed the start tag and end tag to a cleaner <data> and </data> because there are often other attributes (such as CSS class), that have an effect on subsequent steps.
  6. Since HTML has a similar structure as XML, we can now use the Xml.Tables() function to transform the raw data to a table. NB: This is where it can go wrong. HTML pages may contain code that does not comply with the more strict XML guidelines. If you get an error, it will tell you on what line it goes wrong. If that happens, go back to the list and filter out the lines that give the error.
  7. Once you have converted the raw data back to a table, you can go ahead and transform the data back into the form that you wish. In this case, I have transformed it back to resemble the original table from the webpage.

I have tested this strategy with Power BI service, incremental refresh and scheduled refresh and it works like a charm.

 

If you have any suggestions on how to improve this code, please let me know.

 

Cheers,

 

Kirvis

2 REPLIES 2
Super User I
Super User I

@kirvis,

 

For using online data and on premises data like Azure sql server, Share point online and SSAS etc,.

 

For scheduling refresh web page, it's a good work around.

 

Please check the below link for the same. 

https://blog.crossjoin.co.uk/2016/08/23/web-contents-m-functions-and-dataset-refresh-errors-in-power...

 

I hope this may help you. 

 

Regards, 

Manikumar 

 





If this helps, Appreciate your KUDOS!
Did I answer your question? Mark my post as a solution!


Proud to be a Super User!




Hi @manikumar34 ,

 

Thanks for your response.

 

I am aware of the blog posts that Chris wrote, but if I'm not mistaken, he addresses a different issue. He does not use the function Web.Page() in his script. He does not have to because the response of the URL he uses is JSON, which Power Query handles very well.

 

The workaround I mention in my post is specifically about scraping data from web pages. As you can see in the MS documentation about sources I mention in the post, you need a gateway to be able to use scheduled refresh for that.

 

My goal for this workaround was to explicitly circumvent the need for setting up a gateway.

 

Hope that explains.

 

Kirvis

Helpful resources

Announcements
secondImage

Congratulations!

We are excited to announce the Power BI Super Users!

Wave Release 2

Check out the updates in Power BI.

Overview of Power BI 2020 release wave 2!

Microsoft Ignite

Microsoft Ignite

Join digitally, March 2–4, 2021 to explore new tech that's ready to implement. Experience the keynote in mixed reality through AltspaceVR!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

Top Solution Authors