Hi all,
Like many others (here, here and here), I have been struggling with the inability of Power BI service to perform a scheduled refresh on a public web source without using a personal gateway.
According to the MS documentation, this is still not possible. (Don't ask me why)
The problem with this is that the M function Web.Page(), which is needed to work with HTML input, requires a gateway to work.
However, I have been playing around a bit lately, and believe that I have found a workaround that is relatively flexible. I'd like to share it here and get some feedback. Maybe some of you have some suggestions on how to improve the code to make it more broadly applicable.
In this example, I am trying to load the table from the following page into Power Query: https://www.w3schools.com/html/html_tables.asp.
let
URL = "https://www.w3schools.com/html/html_tables.asp",
Start = "<table id=""customers"">",
End = "</table>",
Source = Table.FromColumns({Lines.FromBinary(Web.Contents(URL))}),
ToList = Table.Column(Source, "Column1"),
RowCount = Table.RowCount(Source),
TableStart = List.PositionOf(ToList, Start),
TableEnd = List.PositionOf(ToList, End),
RemoveTopRows = Table.Skip(Source,TableStart),
RemoveBottomRows = Table.RemoveLastN(RemoveTopRows,RowCount-TableEnd-1),
List = Table.Column(RemoveBottomRows, "Column1"),
ChangeStart = List.ReplaceValue(List, Start, "<data>", Replacer.ReplaceText),
ChangeEnd = List.ReplaceValue(ChangeStart, End, "</data>", Replacer.ReplaceText),
XML = Xml.Tables(Lines.ToText(ChangeEnd)),
Table = XML{0}[Table],
#"Added Custom1" = Table.AddColumn(Table, "data", each if [th] <> null then [th] else [td]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"th", "td"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Custom", each Table.Transpose([data])),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"data"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", {"Column1", "Column2", "Column3"}, {"Column1", "Column2", "Column3"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Custom", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Company", type text}, {"Contact", type text}, {"Country", type text}})
in
#"Changed Type"
The key here is to circumvent the use of the Web.Page() function, so here are the steps:
I have tested this strategy with Power BI service, incremental refresh and scheduled refresh and it works like a charm.
If you have any suggestions on how to improve this code, please let me know.
Cheers,
Kirvis
For using online data and on premises data like Azure sql server, Share point online and SSAS etc,.
For scheduling refresh web page, it's a good work around.
Please check the below link for the same.
I hope this may help you.
Regards,
Manikumar
Proud to be a Super User!
Hi @manikumar34 ,
Thanks for your response.
I am aware of the blog posts that Chris wrote, but if I'm not mistaken, he addresses a different issue. He does not use the function Web.Page() in his script. He does not have to because the response of the URL he uses is JSON, which Power Query handles very well.
The workaround I mention in my post is specifically about scraping data from web pages. As you can see in the MS documentation about sources I mention in the post, you need a gateway to be able to use scheduled refresh for that.
My goal for this workaround was to explicitly circumvent the need for setting up a gateway.
Hope that explains.
Kirvis
Join us for an in-depth look at the new Power BI features and capabilities at the free Microsoft Business Applications Launch Event.
User | Count |
---|---|
64 | |
35 | |
24 | |
16 | |
9 |
User | Count |
---|---|
72 | |
44 | |
37 | |
26 | |
19 |